beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.81k stars 1.82k forks source link

Lyrics plugin cannot fetch lyrics when accented (non-ascii?) characters in URL (from artist name or title) #2357

Closed katonagl closed 7 years ago

katonagl commented 7 years ago

Problem

I try to fetch lyrics for songs where either the artist name or the title has non ascii charcters.

The command used is beet -vv lyrics halász micimackó

The answer is:

lyrics: failed to fetch: http://lyrics.wikia.com/Hal%C3%A1sz_Judit:Micimack%C3%B3 (404)
lyrics: failed to fetch: https://www.musixmatch.com/lyrics/Hal%C3%A1sz-Judit/Micimack%C3%B3 (404)
lyrics: lyrics not found: Halász Judit - Halász Judit - Micimackó

However, the second link do exist. The problem should be with character encoding, since titles with only ascii characters work.

Setup

My configuration (output of beet config) is:

lyrics:
    bing_lang_from: []
    force: yes
    auto: yes
    google_API_key: REDACTED
    bing_client_secret: REDACTED
    genius_api_key: REDACTED
    google_engine_ID: REDACTED
    bing_lang_to:
    fallback:
    sources:
    - google
    - lyricwiki
    - lyrics.com
    - musixmatch

import:
    move: yes
directory: /media/music
mbsubmit:
    format: $track. $title ($length)
    threshold: medium
library: ~/.config/beets/musiclibrary.blb

plugins: lyrics fetchart fromfilename mbsubmit scrub
scrub:
    auto: yes
fetchart:
    auto: yes
    google_engine: 001442825323518660753:hrh5ch1gjzm
    cautious: no
    cover_names:
    - cover
    - front
    - art
    - album
    - folder
    sources:
    - filesystem
    - coverart
    - itunes
    - amazon
    - albumart
    store_source: no
    maxwidth: 0
    enforce_ratio: no
    google_key: REDACTED
    fanarttv_key: REDACTED
    minwidth: 0
sampsyo commented 7 years ago

Hi! It actually looks like this isn't an encoding issue but a case of Musixmatch blocking our scraper:

>>> import requests
>>> requests.get('https://www.musixmatch.com/lyrics/Hal%C3%A1sz-Judit/Micimack3%B3')
<Response [404]>

We can change our user-agent from the requests default, but I'm afraid it will only be a matter of time before that gets blocked too. We'll see, I guess?

sampsyo commented 7 years ago

OK, I've added a User-Agent header to the plugin. It's not a real fix, of course, but it might make this work for now! Care to give it a try?