beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.74k stars 1.82k forks source link

Alt. Rock Alternative Rock Alternative-Rock are not the same folder #389

Closed ghost closed 10 years ago

ghost commented 10 years ago

I'm using the lastgenre plugin. This seems to have a bug I have genres Alt. Rock, Alternative Rock and Alternative-Rock Isn't this all the same?

  # Automatically migrated from legacy .beetsconfig.                              

  directory: /tank/music                                                          
  library: ~/data/musiclibrary.blb                                                
  path_format: $genre/$artist/$album/$track $title                                

  paths:                                                                          
      default: $genre/$albumartist/$album/$track $title                           
      singleton: Singletons/$artist - $title                                      
      comp: $genre/$album/$track $title                                           
      albumtype:soundtrack: Soundtracks/$album/$track $title                      

  import:                                                                         
      move: true                                                                  
      copy: false                                                                 
  plugins:                                                                        
      fetchart                                                                    
      lastgenre                                                                   
      missing                                                                     

  acoustid:                                                                       
      apikey: foobar                                                            

  fetchart:                                                                       
      auto: true                                                                  

  lastgenre:                                                                      
      force: true                                                                 
      canonical: '' 
Kraymer commented 10 years ago

Well, yes but the plugin has no way to know that unless you tell him. Copy https://github.com/sampsyo/beets/blob/master/beetsplug/lastgenre/genres-tree.yaml on your hard drive, and edit it so that Alt. Rock and Alternative Rock are children of Alternative-Rock. Use that file for the canonicalization option + make use of the whitelist option.

ps: I wrote about that genres canonicalization feature on my blog

ghost commented 10 years ago

This does not work for me. I did beet lastgenre -f beet move Still have the folders

27 lastgenre:                                                                      
 28     force: yes                                                                  
 29     canonical: /home/markus/.config/beets/genres-tree.yaml                      
 30     whitelist: /home/markus/.config/beets/genres.txt     

genres-tree.yaml

323 - rock:                                                                         
324     - alternative rock:                                                         
325         - alt. rock                                                             
326         - alternative-rock

genres.txt

  23 alternative rock                                                                
  24 alt. rock                                                                       
  25 alternative-rock
sampsyo commented 10 years ago

Are you sure these genres are not already present on the files? Lastgenre will, by default, not change the existing genre if it can't find a suitable genre for a given album.

Set the "fallback" option if you instead want to clear these genres.

To help see what's going on, it would be helpful to see the output of the lastgenre command, which indicates the source for each genre it applies.

sampsyo commented 10 years ago

Closing for now, since the user seems to have disappeared.

marlemion commented 10 years ago

Hi. I have exactly the same problem, but I found out, that with the command chain:

beet lastgenre (-f) beet move

beet did not move the files, but copied them. My Problem was that canoncicalisation did not work with the default entry (canonical: '') for some reason. So I had to copy the genres-tree.yaml file and put it in (canonical: '/path/to/file'). After that, I ran beet lastgenre and could verify that the obscure genres dissappeared and had replaced by other, more generic ones.

Then I tried to move the files with beet move, but now I have duplicated any album, which had different genre entries before and after the lastgenre command.

This is more or less a little desaster, because now I have to find all the duplicates and remove them.

sampsyo commented 10 years ago

@marlemion Are you saying that "beet move" copies files every time, even without the copy flag? That's very troublesome and a separate issue. Can you open another bug describing your setup?

marlemion commented 10 years ago

Sorry. False alarm. I moved the entire collection to another directory via beet move and it didn't leave anything in the old directory. So no duplication.

But I reckoned why I actually thought it would have been duped. I want canonicalization with lastgenre and put the albums in directories with the genres as name. However, canonicalization does not work. I have tried canonical: '' and I have tried to use a separate file as indicated above.

As I am on Gentoo, beets ist at 1.3.3 right now. Might be the version the problem?

marlemion commented 10 years ago

My config for lastgenre:

lastgenre: whitelist: /usr/lib/python2.7/site-packages/beetsplug/lastgenre/genres.txt min_weight: 10 count: 1 fallback: Unknown canonical: /usr/lib/python2.7/site-packages/beetsplug/lastgenre/genres-tree.yaml source: 'artist' force: True auto: True

genres-tree.yaml:

beet -v lastgenre -f galliano:

data directory: /home/XXXX/.config/beets library database: /mnt/sda/Music/musiclibrary.blb library directory: /mnt/sda/Music lastfm.tag (min. 10): acid jazz [100] genre for album Galliano - : 4 (artist): Acid Jazz Sending event: database_change Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: write Sending event: cli_exit

What's wrong?

marlemion commented 10 years ago

Well, I tracked it down in the plugin: in the function 'find_parents' the script searches for candidate chains including the obtained genre. In my example this is:

['electronic', 'electronica', 'downtempo', u'acid jazz']

After that, it tries to get the parent from this chain. What it is actually doing in function '_strings_to_genre' is to get one genre and check whether it is in the whitelist (_is_allowed(parent)) and if it is, it breaks.

However, it finds acid jazz at first and then breaks. This is not how it should work, is it?

Does it mean I would have to remove Acid Jazz from the Whitelist?

marlemion commented 10 years ago

Removing acid jazz from the whitelist resulted in the lastfm tag to be chosen as 'funk' which is apparently the next tag offered by lastfm. So blacklisting is not the solution. There is something fishy with canonicalization and I am puzzled that nobody encounters it but me.

Sorry for spamming btw.

marlemion commented 10 years ago

When I use 'reversed(find_parents(XXX)), it grabs the generic genre before the existing.

sampsyo commented 10 years ago

Aha—do you think reversing the list is the right solution? If so, do you think you can open a pull request? This might be simple enough that the in-browser "edit" button could suffice to make the PR.

Kraymer commented 10 years ago

Because the c14n is supposed to return the most accurate whitelisted genre, I don't think reversing the list will work in the general case (ie multiple whitelisted genres along the branch).
Can you give me the album name so I can try to reproduce the pb ?

marlemion commented 10 years ago

Yeah, right, reversed order would always result in the very basic genres. Is there a way to adjust the level of genres? As far as I understood the genres-tree.yaml file, there are several layers of genres, e.g. very basic, basic, less basic, specific, more specific, ridicilous specific. If I would like to have the 'basic' genres only, because very basic would be too generic and less basic too specialized, is there any way to achieve this with the current implementation?

The link to the album at discogs: http://www.discogs.com/Galliano-4/release/166950 and on musicbrainz: http://musicbrainz.org/release/e47312ec-43b2-4efa-a7af-a6a37a8be5d5

Thanks for your help, guys, beets rocks. :)

Kraymer commented 10 years ago

Whether you qualify a genre "basic" or "ridiculous specific" is subjective and depends mainly of how deep you are into a given genre ;) Someone who have thousands of jazz files may want to distinct "acid jazz" and, say, "chill out". It's likely that you want to be more specific in one genre area than another, that's why there is not a global genre-acurateness option.

The use case I had in mind when coding c14n is that you have the given set of genres of your library and you want to make sure any new file imported will use one of those. To achieve this, you edit genres-tree.yaml so that each genre become the root of the branch it belongs to. Because you enforced only one allowed genre per branch (the root) this tree structure permits to funnel non-allowed genres into allowed genres.

I wrote a blog post about that issue a while ago, you may want to read it, as it has some illustrations that hopefully make my reasoning more clear.

Kraymer commented 10 years ago

From you config i see :

whitelist: /usr/lib/python2.7/site-packages/beetsplug/lastgenre/genres.txt

Have you edited the file ?
By default it contains all genres but you want to strip the content to list only the genres you want to handle in your library. eg if the file contains only electronica then the algo will do acid jazz => downtempo => electronica and use the latter.

marlemion commented 10 years ago

Tried this. See my 4th post. It does not follow your described behavior. Unfortunately...

marlemion commented 10 years ago

I will have to think about modifying the tree-fil as suggested in your blog post, but I still don't see the use for the different layers in the tree file then. Anyway, I'll be away until next Tuesday, so this will have to wait. :(

marlemion commented 10 years ago

HI, it is me again. I have followed your advise and modified the genres.txt to include some very generic genres, for example 'dance' and 'hip hop & rap'.

It works quite ok right now, but still there are issues I don't understand. For example, I try to get the right genre for 'ASD', a German rap combo. But instead, it sorts it into 'dance'. I have modified the lastgenre plugin code to be more verbose, especially to show all tags and weight such as read from the last.fm webpage:

res_tag: deutschrap (100) res_tag: hip hop (100) res_tag: hip-hop (97) res_tag: german (89) res_tag: rap (78) res_tag: german hiphop (51) res_tag: samy deluxe (19) res_tag: german hip hop (14) res_tag: german hip-hop (14) res_tag: afrob (12) res_tag: deutsch (12) res_tag: hamburg (12) res_tag: deutscher hip hop (10) res_tag: asd (10) res_tag: deutsch rap (10) res_tag: hiphop (10) res_tag: german rap (10) res_tag: hamburgs finest (8) res_tag: deutscher sprechgesang (6) res_tag: german artists (6) res_tag: hamburg hip-hop (6) res_tag: old school (6) res_tag: stuttgart (6) res_tag: deutscher hiphop (4) res_tag: geht ab (4) res_tag: kolchose (4) res_tag: stuttgart hip-hop (4) res_tag: splash festival (4) res_tag: kultrap (4) res_tag: hip hop mit niveau (4) res_tag: deutsch-hip-hop (4) res_tag: deutscha hip-hop (4) res_tag: rocken roll : (4) res_tag: germany (4) res_tag: real shit (4) res_tag: male vocalists (4) res_tag: german bands (4) res_tag: hamburger schule (4) res_tag: smooth (4) res_tag: oldschool german hip-hop (4) res_tag: since (4) res_tag: italian trash (2) res_tag: no more war (2) res_tag: party (2) res_tag: band (2) res_tag: dance (2) lastfm.tag (min. 15): dance [2] tags: [u'dance'] tag: dance

Here you can see the problem: the first 4 items of tags are all rated very high, but still, it is sorted into 'dance', which is the last on the list. I have set min_weight > 2 (actuall 15), but it still sorts it in dance. hip-hop or hip hop should be funneled into 'hip hop & rap':

Any thought on this? It definitely does not behave as it should. 'dance' and 'hip hop' are even the same level, 'hip hop & rap' is one level above dance. I don't get the logic of choosing.

marlemion commented 10 years ago

Shouldn't be the logic the following:

Grab the first tag with the highest weight from lastfm. Check whether it is in the genres-tree. If not, skip it. If yes, check, if it is in the whitelist. If yes, set it. If no, funnel it down (one level or more levels). Check whether the funneled genre is in the whitelist. If yes, set it, if not, either funnel further down, if possible, or go back to the top and choose the next tag provided by lastfm.

This logic should find the first match with the whitelist and the highest weight, which is canonicalized, if necessary.

Does it work like this? If I go through the code, it just throws out any tag, which is not in the whitelist. After that, it thinks about canonicalizing, if needed. In my case, it just finds 'dance' and is happy. But this is not how it should be. It should find 'hip hop & rap'.

marlemion commented 10 years ago

When I whitelist e.g. hip hop, it looks like this:

res_tag: deutschrap (100) res_tag: hip hop (100) lastfm.tag (min. 15): hip hop [100] tags: [u'hip hop'] tag: hip hop genre for album ASD - Wer hätte das gedacht? (artist): Hip Hop

You see, what it does: it checks, if 'deutschrap' is in the whitelist - it is not: skipped. Then it goes on with 'hip hop' - it is. So it goes on with its (hopefully checks for the weight, even though there is a 'min_weight > weight' statement included, which I don't get). Then it sets 'hip hop'.

But: I have canonicalization enabled! Thus, It does either not canonicalize correctly (better: at all) but just uses the whitelisted tags.

In my opinion, the code is broken. It does not, what it is supposed to do.

marlemion commented 10 years ago

I have modified the plugin to follow the logic as set out above (e.g. canonicalize down, if it is not in the whitelist and then check again):

def _tags_for(obj):
    """Given a pylast entity (album or track), returns a list of
    tag names for that entity. Returns an empty list if the entity is
    not found or another error occurs.
    """
    try:
        # Work around an inconsistency in pylast where
        # Album.get_top_tags() does not return TopItem instances.
        # https://code.google.com/p/pylast/issues/detail?id=85
        if isinstance(obj, pylast.Album):
            res = super(pylast.Album, obj).get_top_tags()
        else:
            res = obj.get_top_tags()
    except PYLAST_EXCEPTIONS as exc:
        log.debug(u'last.fm error: %s' % unicode(exc))
        return []

    tags = []
    min_weight = config['lastgenre']['min_weight'].get(int)
    count = config['lastgenre']['count'].get(int)
    dbg = []
    for el in res:
        weight = int(el.weight or 0)
        tag = el.item.get_name().lower()
        log.debug(u'res_tag: {0} ({1})'.format(
          tag,weight))

        if _is_allowed(tag):
            if min_weight > -1 and min_weight > weight and len(tags) > 0:
                return tags
            tags.append(tag)
            dbg.append(u'{0} [{1}]'.format(tag, weight))
            if len(tags) == count:
                break
        for parent in find_parents(tag, options['branches']):
            log.debug(u'res_tag:canon {0})'.format(
               parent))
            if _is_allowed(parent):
                tags.append(parent)
                dbg.append(u'{0} [{1}]'.format(tag, weight))
                if len(tags) == count:
                    break
    log.debug(u'lastfm.tag (min. {0}): {1}'.format(
        min_weight, u', '.join(dbg)
    ))
    return tags

Now it does it correctly:

res_tag: deutschrap (100) res_tag:canon deutschrap) res_tag: hip hop (100) res_tag:canon hip hop) res_tag:canon hip hop & rap) res_tag: hip-hop (97) res_tag:canon hip-hop) res_tag: german (89) res_tag:canon german) res_tag: rap (78) res_tag:canon rap) res_tag:canon hip hop & rap) res_tag: german hiphop (51) res_tag:canon german hiphop) res_tag: samy deluxe (19) res_tag:canon samy deluxe) res_tag: german hip hop (14) res_tag:canon german hip hop) res_tag: german hip-hop (14) res_tag:canon german hip-hop) res_tag: afrob (12) res_tag:canon afrob) res_tag: deutsch (12) res_tag:canon deutsch) res_tag: hamburg (12) res_tag:canon hamburg) res_tag: deutscher hip hop (10) res_tag:canon deutscher hip hop) res_tag: asd (10) res_tag:canon asd) res_tag: deutsch rap (10) res_tag:canon deutsch rap) res_tag: hiphop (10) res_tag:canon hiphop) res_tag: german rap (10) res_tag:canon german rap) res_tag: hamburgs finest (8) res_tag:canon hamburgs finest) res_tag: deutscher sprechgesang (6) res_tag:canon deutscher sprechgesang) res_tag: german artists (6) res_tag:canon german artists) res_tag: hamburg hip-hop (6) res_tag:canon hamburg hip-hop) res_tag: old school (6) res_tag:canon old school) res_tag: stuttgart (6) res_tag:canon stuttgart) res_tag: deutscher hiphop (4) res_tag:canon deutscher hiphop) res_tag: geht ab (4) res_tag:canon geht ab) res_tag: kolchose (4) res_tag:canon kolchose) res_tag: stuttgart hip-hop (4) res_tag:canon stuttgart hip-hop) res_tag: splash festival (4) res_tag:canon splash festival) res_tag: kultrap (4) res_tag:canon kultrap) res_tag: hip hop mit niveau (4) res_tag:canon hip hop mit niveau) res_tag: deutsch-hip-hop (4) res_tag:canon deutsch-hip-hop) res_tag: deutscha hip-hop (4) res_tag:canon deutscha hip-hop) res_tag: rocken roll : (4) res_tag:canon rocken roll :) res_tag: germany (4) res_tag:canon germany) res_tag: real shit (4) res_tag:canon real shit) res_tag: male vocalists (4) res_tag:canon male vocalists) res_tag: german bands (4) res_tag:canon german bands) res_tag: hamburger schule (4) res_tag:canon hamburger schule) res_tag: smooth (4) res_tag:canon smooth) res_tag: oldschool german hip-hop (4) res_tag:canon oldschool german hip-hop) res_tag: since (4) res_tag:canon since) res_tag: italian trash (2) res_tag:canon italian trash) res_tag: no more war (2) res_tag:canon no more war) res_tag: party (2) res_tag:canon party) res_tag: band (2) res_tag:canon band) res_tag: dance (2) tags: ['hip hop & rap', 'hip hop & rap'] tag: hip hop & rap tag: hip hop & rap genre for album ASD - Wer hätte das gedacht? (artist): Hip Hop & Rap

What do you think?

Kraymer commented 10 years ago

Seems you've put a lot of time debugging this and even have a working code, great ! I'm not familiar with the lastfm weighings that have been introduced recently, but you're right it seems to have broken the c14n logic. Would you mind submit your code in a PR to facilitate the review ? thks. :rabbit:

sampsyo commented 10 years ago

I second the pull request suggestion! :hatched_chick:

marlemion commented 10 years ago

Done. Thanks for your help! https://github.com/sampsyo/beets/pull/706