beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.73k stars 1.82k forks source link

lyrics: Genius backend can fetch non-lyrics #1745

Open Nukien opened 8 years ago

Nukien commented 8 years ago

I'm importing a lot of albums, and spot-checking them has turned up some tracks that appear to be getting absolutely bizarre stuff for lyrics. The last few seem to have been mainly for instrumental or non-singing tracks, but that's not a concrete observation.

For example, please see https://www.sendspace.com/file/yb21pl

This contains two tracks, the before importing (in Old/Various Artists) and the one after importing with beets (in Tagged/Various Artists).

The inserted lyrics appear to be from a post on genius.com - http://genius.com/2604138 Scroll way down, or search for "West to the people - made into music"

I would assume that once the issue is found and corrected, we would have to re-do the lyrics on all tracks. Unless someone has a way to verify the embedded lyrics for each ...

sampsyo commented 8 years ago

Thanks for reporting!

Could you please narrow down where the bad lyrics are coming from? Use beet lyrics -f SOME_SONG_HERE to re-fetch lyrics for a particular song. You can work out what source is causing the problem by changing your config to select a single lyrics source at a time.

sampsyo commented 8 years ago

Oops; I somehow missed the fact that you have evidence to blame the Genius source. Sorry!

Could you confirm that's what's going on? Just configure the plugin to use the genius backend and re-fetch. If you can, run the lyrics command in verbose mode so we can see a little more about how the backend is behaving.

Nukien commented 8 years ago

OK, configured to only use the Genius source. Here's the run - nothing looks out of the ordinary

$ beet -v lyrics -f energy 52 del mar 98 user configuration: /home/music/.config/beets/config.yamldata directory: /home/music/.config/beets plugin paths: Sending event: pluginload library database: /stuff/Music/Beets/Library/musiclibrary.blb library directory: /stuff/Music/Tagged Sending event: library_opened lyrics: got lyrics from backend: Genius lyrics: fetched lyrics: Energy 52 - Ibiza 99: The Year Of Trance - Caf? Del Mar '98 (Original Three 'N One Edit) Sending event: write Sending event: after_write Sending event: database_change Sending event: cli_exit

And the head of what it fetched

$ beet lyrics -p energy 52 del mar 98 lyrics: lyrics already present: Energy 52 - Ibiza 99: The Year Of Trance - Caf? Del Mar '98 (Original Three 'N One Edit) [Empty Section] is best described as an ?open letter? from Kanye West to the people - made into music.Produced by >the Very G.O.O.D Beats crew,is sonically simplistic yet still possesses melody courtesy of the droning >synths.

If I take Genius out of the list (use the others) sources: google lyricwiki lyrics.com musixmatch then it doesn't find lyrics, which is more or less what is expected since it's an instrumental track. Unfortunately it didn't remove the existing wrong lyrics, or replace them with Instrumental as I've seen on some other tracks.

library database: /stuff/Music/Beets/Library/musiclibrary.blb library directory: /stuff/Music/Tagged Sending event: library_opened lyrics: failed to fetch: http://lyrics.wikia.com/Energy_52:Caf%C3%A9_Del_Mar_%2798_%28Original_Three_%27N_One_Edit%29 (404) lyrics: failed to fetch: http://lyrics.wikia.com/Energy_52:Caf%C3%A9_Del_Mar_%2798 (404) lyrics: lyrics not found: Energy 52 - Ibiza 99: The Year Of Trance - Caf? Del Mar '98 (Original Three 'N One Edit) Sending event: cli_exit

sampsyo commented 8 years ago

Thanks for the extra detail! This should help reproduce the problem.

Could you please take a look, @sadatay?

sadatay commented 8 years ago

Huh, that's interesting. I wonder if the search endpoint is somehow retrieving articles from Genius as well as lyrics. The documentation is a little self-contradicting:

The search capability covers all content hosted on Genius (all songs).

I'll take a look into it at some point this week, thanks for bringing this to attention.

sampsyo commented 8 years ago

Yeah, that certainly is ambiguous. Thanks for looking into it!

sadatay commented 8 years ago

Hmm, so this seems to be something we might not be able to get around just now. I queried the Genius API for "top 10" to see if it turned up articles as well as songs.

screen shot 2015-12-14 at 9 47 53 pm

Unfortunately, as you can see here, the 'type' of this document is erroneously listed as 'song'. So I'm not really sure how we are supposed to differentiate between what are songs and not songs if there is erroneous data like this. Any ideas, anybody?

sadatay commented 8 years ago

Hmm, on second thought, checking out the "primary artist" field...

screen shot 2015-12-14 at 9 50 42 pm

Maybe we could filter out 'songs' that have "Rap Genius" listed as the artist? That might not cover everything but it'd probably help a little bit.

sampsyo commented 8 years ago

Huh; that's definitely strange. I like the idea of filtering out "Rap Genius" as an artist, but you're right that we might not be able to get everything.

Perhaps we should also report a bug to the Genius people.

sampsyo commented 8 years ago

We could also filter the results to require that the title of the song returned by RG must exactly match the title of the song we're filling out the lyrics for. This would also help prevent mix-ups when the search engine returns the "wrong" song.