enricobacis / lyricwikia

Python API to get song lyrics from LyricWikia
MIT License
39 stars 12 forks source link

Max retries exceeded with url ... #11

Closed paulthemagno closed 4 years ago

paulthemagno commented 4 years ago

I fell into this problem doing a loop with this function lyricwikia.get_lyrics(artist, title) on some couples artist,title. Many of these lyrics are found, some of them not. I have this code:

def get_lyrics_body_by_title_and_artist(title,artist):
    try:
        lyrics = lyricwikia.get_lyrics(artist, title)
    except lyricwikia.LyricsNotFound:
        lyrics = None
    except Exception as e:
        print(e)
        exit()
    finally:
        return lyrics

I already did this kind of processing 2 months ago without problems. All the exceptions raised were lyricwikia.LyricsNotFound. Now I noticed other kinds of exceptions are raised. Indeed I have added the except Exception as e: now to catch them and understand which can be the problem.

Sometimes an exception like this was raised:

HTTPSConnectionPool(host='lyrics.fandom.com', port=443): Max retries exceeded with url: /wiki/Geoff_Bullock:Light_To_Blinded_Eyes (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa89ce0c1d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

I tried to do lyrics = lyricwikia.get_lyrics("Geoff Bullock", "Blinded Eyes") and it worked. So I don't know which can be the reason during the execution. I noticed that every time I run the program this exceptions occours with a different artist,title couple. Why some months ago I didn't need to use except Exception as e: and it worked well and now some songs are not taken?

enricobacis commented 4 years ago

It might be that lyricwikia is preventing you from scraping (as this violates their ToS) by generating HTTP error codes when too many requests are made in a short amount of time. You could try to add some time.sleep(1) in your loop.

However, at the end of the day, they say that scraping their website is not allowed.