beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.86k stars 1.82k forks source link

Genius lyrics provider not finding track #3446

Closed xhocquet closed 4 years ago

xhocquet commented 4 years ago

Hey there, first off thanks for the software, it's very useful! I'm trying to get some lyrics for a new album I added to my library. I originally noticed more 'edge' hip-hop lyrics not importing despite Genius being enabled, and Genius having a TON of rap lyrics. It does not seem like the plugin is making a request to Genius like the other providers (which do not have the song)

Specific song I am trying to import below: https://genius.com/Billy-woods-marlow-lyrics

Problem

Running this command in verbose (-vv) mode:

$ beet -vvv lyrics billy woods marlow -f
user configuration: C:\Users\meesles\AppData\Roaming\beets\config.yaml
data directory: C:\Users\meesles\AppData\Roaming\beets
plugin paths:
Sending event: pluginload
lyrics: Disabling google source: no API key configured.
library database: C:\Users\meesles\AppData\Roaming\beets\library.db
library directory: D:\Library\Music
Sending event: library_opened
lyrics: failed to fetch: http://lyrics.wikia.com/Billy_Woods:Marlow (404)
lyrics: failed to fetch: https://www.musixmatch.com/lyrics/Billy-Woods/Marlow (404)
lyrics: lyrics not found: Billy Woods - Terror Management - Marlow
Sending event: cli_exit

File: http://www.mediafire.com/file/0rfq90y1v77mejf/01_Marlow.mp3/file

Setup

My configuration (output of beet config) is:

lyrics:                                                                  
    bing_lang_from: []                                                   
    sources: google lyricwiki musixmatch genius                          
    fallback: ''                                                         
    auto: yes                                                            
    bing_client_secret: REDACTED                                         
    bing_lang_to:                                                        
    google_API_key: REDACTED                                             
    google_engine_ID: REDACTED                                           
    genius_api_key: REDACTED                                             
    force: no                                                            
    local: no                                                            
directory: D:\Library\Music                                              

import:                                                                  
    move: yes                                                            
    incremental: yes                                                     

paths:                                                                   
    default: $albumartist/$album%aunique{}/$track $title                 
    singleton: Non-Album/$artist/$title                                  
    comp: Compilations/$album%aunique{}/$track $title                    

ui:                                                                      
    color: yes                                                           

musicbrainz:                                                             
    user: xhocquet                                                       
    pass: REDACTED                                                       

plugins: missing mbcollection mbsync lyrics random mbsubmit              
mbcollection:                                                            
    auto: yes                                                            
    collection: 51e9fbd7-7fdf-4fd5-a315-4cd225562474                     
    remove: yes                                                          
mbsubmit:                                                                
    format: $track. $title - $artist ($length)                           
    threshold: medium                                                    
missing:                                                                 
    count: no                                                            
    total: no                                                            
    album: no                                                            
sampsyo commented 4 years ago

Hi! Looking at the code, it doesn’t look like the Genius backend logs any messages in the success case or in several possible failure cases. So if the plug-in were making requests and not finding anything for some reason, it’s possible that you wouldn’t see anything in the log.

So: is your diagnosis that the plugin isn’t making requests based on the log alone, or some other evidence like network traffic? If the former, something could indeed be broken, but we’ll need to add more logging to find out what.

xhocquet commented 4 years ago

@sampsyo Spot on, I just expected a log like the other matchers but do not know for sure. How can I generate more logs to further debug this issue? Happy to help! As you saw, the link to the song matches all the fields from my metadata, so not sure why it wouldn't be working.

sampsyo commented 4 years ago

Great! I think we should add a few self._log.debug calls, like this one here: https://github.com/beetbox/beets/blob/60bba370c0ecc58b4057d78867273e5936d22dee/beetsplug/lyrics.py#L366-L367

…elsewhere throughout the backend's logic.

For example, I think the most likely culprit is that this condition is false: https://github.com/beetbox/beets/blob/60bba370c0ecc58b4057d78867273e5936d22dee/beetsplug/lyrics.py#L402-L404

So, after that, we could put something like:

else:
    self._log.debug('No matching artist.')
    return None

or similar. Then, maybe it would make sense to log something about the contents of json["response"]["hits"] to diagnose why this is missing.

If you do go about adding this extra logging (thanks!!), could I ask you the favor of opening a pull request so other users can benefit too?

xhocquet commented 4 years ago

@sampsyo I wasn't expecting to patch this up, but I'd be happy to try! I love this project and my Python is passable 😄 . I'll start looking into dev setup and see what I can do! Thanks for the pointers.

xhocquet commented 4 years ago

@sampsyo Cool, made a little progress so far. I added debuggers as suggested and I believe I've landed myself smack dab in one of Python (2)'s most infamous headaches: string encodings!

What I'm seeing when I print out the primary artist of the Genius response:

lyrics: Red Marlow
lyrics: Christopher Marlowe
lyrics: Christopher Marlowe
lyrics: Marlowe
lyrics: Christopher Marlowe
lyrics: Marlowe
lyrics: ​billy woods
lyrics: Marlowe
lyrics: Marlowe
lyrics: Christopher Marlowe

In fact, the billy_woods response in my terminal has an extra character and one that causes issues trying to decode:

Screen Shot 2019-12-04 at 1 07 21 PM

I tried a couple things:

  1. Using .strip() which doesn't seem to remove the character
  2. Using .encode() with both utf-8 and ascii, both return a similar but different error:
    • UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
    • UnicodeEncodeError: 'ascii' codec can't encode character u'\u200b' in position 0: ordinal not in range(128)
  3. Using str() to convert to a string. Fails with error:
    • UnicodeEncodeError: 'ascii' codec can't encode character u'\u200b' in position 0: ordinal not in range(128)

I'm sure by just trying a bunch of different options I could get it to work, but two questions for you: 1) Is this even the right approach? Is stripping the strings appropriate during comparisons? Or is this simply a data issue for Genius? 2) If this is something we should do, any guidance on existing patterns to deal with this?

Cheers!

sampsyo commented 4 years ago

Wow!! That's quite a thing! According to your cut n' paste, that's Unicode character U+200B ZERO WIDTH SPACE. I have no clue why Genius would include that on that artist. It definitely seems like a data issue in Genius.

If it affects more than this specific single artist, maybe it's worth adding a fix that strips this specific character, i.e., artist.strip(u'\u200b').

Regardless, however, adding the extra logging to make this problem more visible seems like a good idea!

sampsyo commented 4 years ago

Aha: https://genius.com/Genius-users-how-to-add-a-zero-width-space-annotated

xhocquet commented 4 years ago

@sampsyo Nice find! Makes sense.. I guess? I can think of a few different ways to maintain capitalization without weird characters though 🤔

So if it's a consistent pattern, can we add the strip() call? If so I'll do that and open a PR with some new loggers as well. I actually ended up using a log on each entry that came back from the list, which may be a bit too much for normal users. What do you think? Just one item saying Genius could not match the artist?

sampsyo commented 4 years ago

Yeah, that sounds great!

Indeed, I don't think we need to log every artist. Just logging that a match wasn't found seems good.

xhocquet commented 4 years ago

@sampsyo v1 up for your review! https://github.com/beetbox/beets/pull/3448

For my use-case, I also had to lowercase the artist comparisons to get it to match. I see a lot of lowercasing going on throughout the codebase, so I assumed that was a safe operation. Let me know if I need to make any changes!

xhocquet commented 4 years ago

Fixed!