coursera-dl / edx-dl

A simple tool to download video lectures from edx.org (and other openedx sites)
GNU Lesser General Public License v3.0
1.93k stars 640 forks source link

Too many requests - http 429 #626

Closed gg4u closed 4 years ago

gg4u commented 4 years ago

Can't download - Too many requests - http 429

steps to reproduce

I wonder if the credentials are not blocked by the backend.

See trace below:


edx_dl version 0.1.13
Building initial headers for future requests.
Getting initial CSRF token.
Traceback (most recent call last):
  File "/Users/XXX/Sites/miniconda3/bin/edx-dl", line 8, in <module>
    sys.exit(main())
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 1006, in main
    headers = edx_get_headers()
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 431, in edx_get_headers
    'X-CSRFToken': _get_initial_token(EDX_HOMEPAGE),
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/site-packages/edx_dl/edx_dl.py", line 168, in _get_initial_token
    opener.open(url)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 563, in error
    result = self._call_chain(*args)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 755, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 641, in http_response
    'http', request, response, code, msg, hdrs)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
    result = func(*args)
  File "/Users/XXX/Sites/miniconda3/lib/python3.7/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests

Edit

I also noticed this strange behaviour:

gg4u commented 4 years ago

Solution

Found a solution to the problem, which was a mismatched by a declared header at added line 166 ( see comment https://github.com/coursera-dl/edx-dl/issues/468#issuecomment-395139398 ) : added line opener.addheaders = [('User-agent', 'Mozilla/5.0')]

and the header used in edx_get_headers() :

make sure you use the same User-agent, otherwise the user agent in the first token will mismatched with future requests.