beetbox / beets

music library manager and MusicBrainz tagger
http://beets.io/
MIT License
12.6k stars 1.8k forks source link

MusixMatch server ignore requests with beets user agent #2546

Open Kraymer opened 7 years ago

Kraymer commented 7 years ago

With user agent :

$ export BEETS_TEST_LYRICS_SOURCES=1
$ nosetests -s -v test/test_lyrics.py
$ ======================================================================
FAIL: Test default backends with songs known to exist in respective databases.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/flap/Dev/beets/test/test_lyrics.py", line 316, in test_backend_sources_ok
    self.assertFalse(errors)
AssertionError: ['LyricsCom', 'MusiXmatch', 'Genius'] is not false
-------------------- >> begin captured logging << --------------------
[...]
requests.packages.urllib3.connectionpool: DEBUG: Starting new HTTPS connection (1): www.musixmatch.com
requests.packages.urllib3.connectionpool: DEBUG: https://www.musixmatch.com:443 "GET /lyrics/Santana/Black-Magic-Woman HTTP/1.1" 404 11
beets.lyrics: DEBUG: lyrics: failed to fetch: https://www.musixmatch.com/lyrics/Santana/Black-Magic-Woman (404)

After setting empty '' user agent :

======================================================================
FAIL: Test default backends with songs known to exist in respective databases.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/flap/Dev/beets/test/test_lyrics.py", line 316, in test_backend_sources_ok
    self.assertFalse(errors)
AssertionError: ['LyricsCom', 'Genius'] is not false
-------------------- >> begin captured logging << --------------------
[...]
requests.packages.urllib3.connectionpool: DEBUG: Starting new HTTPS connection (1): www.musixmatch.com
requests.packages.urllib3.connectionpool: DEBUG: https://www.musixmatch.com:443 "GET /lyrics/Santana/Black-Magic-Woman HTTP/1.1" 200 25146
sampsyo commented 7 years ago

Hmm; that's pretty bad news. I added the user agent due to #2357, because the API had blocked the generic requests UA. I expect this is a whack-a-mole game that we'll lose—switching next to an empty UA will likely just eventually cause that to be blocked. :cry:

Kraymer commented 7 years ago

I'll post a PR that drops MM backend.

jackwilsdon commented 7 years ago

The alternative is to use something like fake-useragent to generate a completely random UA string.

Kraymer commented 7 years ago

UserAgent().random... 😁 That's tempting 😒 .

sampsyo commented 7 years ago

Tempting indeed. :smiley: I'm not sure how I feel ethically about randomizing the UA string—if MusixMatch really wants to block the beets scraper, it seems like eventually the right thing to do is to let them do that rather than continuing to find workarounds.

Maybe we can try out their API instead?

jackwilsdon commented 7 years ago

I think the reason we weren't using their API in the first place was that we can only access 30% of any song's lyrics 😕. It might be worth contacting them to see if they'll provide us with a pro API key for free :rofl:.

anarcat commented 7 years ago

i have seen the same behavior in #2630 it seems to me having the UA customizable is a hack, but a good workaround to help users for now.

as for the API, isn't this just like the Genius or Google search stuff? People need to manage their own API creds and all, then we just talk to the API normally...

jackwilsdon commented 7 years ago

That kind of sounds like the best option for people who really want to use the MusixMatch plugin, as then they can change the UA to whatever they want (even their own browser).

anarcat commented 7 years ago

in my experience, changing the UA is not enough: you'll eventually get blocked as well and will have to fill a CAPTCHA to keep going.

there should be some way to do rate-limiting here or something... i tried to implement a bloom filter in #2635 but that would only work if state is saved between sessions...