Miserlou / SoundScrape

SoundCloud (and Bandcamp and Mixcloud) downloader in Python.
MIT License
1.42k stars 146 forks source link

Utilize rate limit information from error messages #203

Open Miserlou opened 6 years ago

Miserlou commented 6 years ago
~/Music $ soundscrape https://soundcloud.com/grrrreatdane/roll-in-peace-bootleg
Downloading: roll in peace (bootleg)
Problem downloading roll in peace (bootleg)
Traceback (most recent call last):
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/soundscrape/soundscrape.py", line 437, in download_tracks
    stream = client.get(track['stream_url'], allow_redirects=False, limit=200)
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/soundcloud/client.py", line 133, in _request
    return wrapped_resource(make_request(method, url, kwargs))
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/soundcloud/request.py", line 148, in make_request
    result.raise_for_status()
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/requests/models.py", line 935, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 429 Client Error: Unknown for url: https://api.soundcloud.com/tracks/377559152/stream?limit=200&client_id=175c043157ffae2c6d5fed16c3d95a4c

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/rjones/anaconda/bin/soundscrape", line 11, in <module>
    sys.exit(main())
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/soundscrape/soundscrape.py", line 119, in main
    process_soundcloud(vargs)
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/soundscrape/soundscrape.py", line 292, in process_soundcloud
    id3_extras=id3_extras)
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/soundscrape/soundscrape.py", line 460, in download_tracks
    puts_safe(e)
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/soundscrape/soundscrape.py", line 1315, in puts_safe
    puts(text)
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/clint/textui/core.py", line 57, in puts
    s = tsplit(s, NEWLINES)
  File "/Users/rjones/anaconda/lib/python3.6/site-packages/clint/utils.py", line 69, in tsplit
    string = string.replace(i, final_delimiter)
AttributeError: 'HTTPError' object has no attribute 'replace'
canihavesomecoffee commented 6 years ago

The 429 seems to be because Soundscrape is getting too popular and is hitting some rate limits (see https://developers.soundcloud.com/docs/api/rate-limits#global-limit)

If I manually request the URL that's throwing an error, I get something like this:

{
  "errors": [
    {
      "meta": {
        "rate_limit": {
          "bucket": "by-client",
          "max_nr_of_requests": 15000,
          "time_window": "PT24H",
          "name": "plays"
        },
        "remaining_requests": 0,
        "reset_time": "2018/01/06 13:38:36 +0000"
      }
    }
  ]
}

I'm uncertain how that can be easily resolved though :(

Miserlou commented 6 years ago

Ah, dang it. I need more keys and to rotate them.

ontheair81 commented 6 years ago

I have the same issue. Although I am using soundscrape for some years now, I am not very experienced with soundcloud.com. Just have a cronjob on my linux server, downloading a daily news show from soundcloud every day automatically.

Since some days I am experiencing the same issue with HTTPerror 429.

I need more keys and to rotate them.

Hope my question is not too noobish, but what does that mean? Would be nice to get a hint how to solve this issue.

Thank you!

canihavesomecoffee commented 6 years ago

@ontheair81 Miserlou means that he needs more API keys to circumvent the 15K download limit that is imposed by SoundCloud. However, as SoundCloud is not allowing any new developers to sign up for new API keys, it's something that's hard or even impossible to fix.

If you happen to have an API key already, you could replace the one that's built-in into SoundScrape, or help all users forward and pass the key to Miserlou.

ontheair81 commented 6 years ago

Thank you for clarifying! Now I understand the issue. Unfortunately I dont have an API key, so I can not help myself or other users by sharing.

So I think we just can hope that the limits will be increased by soundcloud. Anyway, thank you for the information!

kanoalani commented 6 years ago

Is there anything we can do to help? ie, search for more API keys, add per-client rate limits, etc?

canihavesomecoffee commented 6 years ago

I think that if someone would be able to bring in more keys, @Miserlou would appreciate it.

Today the 15K downloads were used up in about 3 hours...

bourdeau commented 6 years ago

Hey there.

Issues #206 & #204 are related.

I think Sounscrape cannot keep going with this client_id & secret key or even with many of them as it will always reach Soundcloud API limits at some point.

As Soundcloud closed new app registering I think it would be better to just ask the user to login and then scrap the DOM (with Selenium + Chrome/Firefox in --headless) to get the token and then download the tracks. If you go to urls like:

https://api.soundcloud.com/i1/tracks/387417257/streams?client_id=MgT8dvRJVcFR9fI5Szar82usLfSQdg3n

You then get a response like:

{
  "http_mp3_128_url": "https://cf-media.sndcdn.com/XFlrBjPMUKHI.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vWEZsckJqUE1VS0hJLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MTc3MzY3NTB9fX1dfQ__&Signature=wfLUQ5-w9NxFP7EOWqB0LN9junfC-DDb4ZNJ8rRJ0MNI0YorEGiCy13V4-nwatJ9G1TX8osBMtfzD~UfEyC-oRifpYWT~0sEnRQ19S9QQpYVg8QoDPCaCrfxMRxNHGpH1WQvGCdgYR5mI6mdj9gwj10ML~hTBbt7AE0~2jOKKy1nvZftydMjTt3cYGdR1gtUP2-J741be4TGzO~pSonV~rVgqbhntatlyTTo9uWj9CCwvGvX4sexZBXS3KPA-76XbqW1wXLbZoDKqtrLk2I9rQnWHyK~OvqUfoJE53HOE6eSS4Ql4JwutQ59sX6w8gao~yqwJFW988Y-MtEtS7zb4A__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ",
  "hls_mp3_128_url": "https://cf-hls-media.sndcdn.com/playlist/XFlrBjPMUKHI.128.mp3/playlist.m3u8?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLWhscy1tZWRpYS5zbmRjZG4uY29tL3BsYXlsaXN0L1hGbHJCalBNVUtISS4xMjgubXAzL3BsYXlsaXN0Lm0zdTgiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MTc3MzY3NTB9fX1dfQ__&Signature=lpHe8hejzHGlLtOiuF1b2esSGUu8mgSCa1Y6wAHb0fioBJV5DLzWy~7XGvaSsxxzlJVSu~X2bGCmmQ0kdU0xwP7dLQX9enl2QJwhm3kkggfAfsCFtFToMmA6BxEBaeMtwwC0ePLRzvSaw7mTLBV2vURUxky7P2RpJD87MURx0n8-mGpsaf1rwMKM9dRLKW6kMFqbkppjl4~geuA1SRC12lWHRV8socCEwfu-evCU~Ds~pa8aX2bSj~BK1Erai0E7ht7~jQImxqVae2gyiqU60QofsYIjyWLbyEcLmdElGtdw3NUEP1TtEnAfTK8-zW6z0DifKmoLV-~jn8QstiCgQg__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ",
  "hls_opus_64_url": "https://cf-hls-opus-media.sndcdn.com/playlist/XFlrBjPMUKHI.64.opus/playlist.m3u8?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLWhscy1vcHVzLW1lZGlhLnNuZGNkbi5jb20vcGxheWxpc3QvWEZsckJqUE1VS0hJLjY0Lm9wdXMvcGxheWxpc3QubTN1OCIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTUxNzczNjc1MH19fV19&Signature=nlLbtT5xpScnlENCznAyPlX0aRvMHA-Y1AXQieVjQg~dWWskwO7b2AB1LDydy7~fzmOkdu6GLoQyK174GLD1fcjy02FD4UQql799CEBtQ4Ker7YzNy4l78F3kbrU03KqcULWot2DvZpuUvNV3nGfUDobwCkC6JLPsx0dkmek8XyigeEemAsbQbHNWPissM10C4LgGzbekQhLwRrOVEQp9ixV7y8z6DghuOcrRg0RTbz~R~NKKdLP3A5tEnLcPPjv1dsyfK~B0dq~ddWFEbH7bPlcB0qLM7TsmGCEHtyTjfeFiYKtKpYZrXegKyUg-nTZcdenIHKsLXAELy5HiUXuLw__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ",
  "preview_mp3_128_url": "https://cf-preview-media.sndcdn.com/preview/0/30/XFlrBjPMUKHI.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLXByZXZpZXctbWVkaWEuc25kY2RuLmNvbS9wcmV2aWV3LzAvMzAvWEZsckJqUE1VS0hJLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MTc3MjkzMzF9fX1dfQ__&Signature=GiyBE4RM-ed6uOHn283mYHiRZVh-4v9vJ3KRjFX2pHZE-G~eC5CZBWN4nlqc18E5KJKpVI3UInRTnloIscatUuAtRKtKjiDR0kn5MxhQA7k2dLGq-2V0KvVCIm1eoSXRDkwFOomg15l62d5b7wWoL-1XJomC7JiEb8ayxPPEr5FRmip9cP05dk57OvqziIwjMIfCv7ubbkSxJ7s-lh9nUvojagQWQ2H~GT-50R0yYoYcFLvG~QpW8HiT2SBIOPT07M9wbavRbF7dqcW1xyStL2QHSWMcESBZBG-ea47oEVuJaYP57FTVCCSGjzbjgKpbWwNup1OSJ53vro50PnKoZw__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ"
}

You can then simply download with "http_mp3_128_url".

The idea would be :

What do you think ?

I can do it if you like but as I don't know Sounscrape lib I've no idea how much refactoring I would have to do to make that fit.

vinz243 commented 6 years ago

As a side note, it is possible to find client_ids searching GitHub with client_id

I'll try reverse engineering their client code

Lanchon commented 6 years ago

for most cases it would be enough to scrape anonymously without login. of course the problem is that soundcloud can change their site to break this, over and over again.

or they could not bother. fyi, there is an unofficial play store client that works great by scraping regual html (no play api use) and google never broke it.

in the interest of reuse, please consider implementing the scraper as a separate scraping lib.

erezsh commented 6 years ago

Would it help if users could provide their own keys?

goose-ws commented 6 years ago

+1 more for this issue. Is there a configurable option to supply one's own API key?

justintgav commented 4 years ago

+1, same problem today. Perhaps implement configuration options to provide our own keys as @erezsh suggested?

goose-ws commented 4 years ago

My cron job running at 00:30 has been consistently failing the last few days, keys already used up in the first half hour of the day (EST). Any solutions or work arounds?

boatcoder commented 4 years ago

You could also automate something like this which seems to be able to download without too much grief.

https://www.klickaud.co/download.php

viatekh commented 3 years ago

My cron job running at 00:30 has been consistently failing the last few days, keys already used up in the first half hour of the day (EST). Any solutions or work arounds?

afaik i know it's a rolling 24 hour period, not complete days.

viatekh commented 3 years ago

so surely it must be simple to generate and plug in our own API keys? i am new to this but will have a play.