Serene-Arc / bulk-downloader-for-reddit

Downloads and archives content from reddit
https://pypi.org/project/bdfr
GNU General Public License v3.0
2.29k stars 211 forks source link

403 Forbidden on Redgifs #175

Closed MrC0D3 closed 3 years ago

MrC0D3 commented 3 years ago

Getting a 403 forbidden on Redgifs using latest version 1.9.4. Anyone else encountering the same?

ymgenesis commented 3 years ago

Yup I'm getting this on every redgifs download attempt, as well.

HTTPError: HTTP Error 403: Forbidden See CONSOLE_LOG.txt for more information ERROR:root:HTTPError Traceback (most recent call last): File "bulk-downloader-for-reddit/script.py", line 351, in <module> main() File "bulk-downloader-for-reddit/script.py", line 337, in main else: download(posts) File "bulk-downloader-for-reddit/script.py", line 155, in download downloadPost(details,directory) File "/bulk-downloader-for-reddit/script.py", line 95, in downloadPost downloaders[SUBMISSION['TYPE']] (directory,SUBMISSION) File "bulk-downloader-for-reddit/src/downloaders/redgifs.py", line 15, in __init__ POST['MEDIAURL'] = self.getLink(POST['CONTENTURL']) File "bulk-downloader-for-reddit/src/downloaders/redgifs.py", line 41, in getLink pageSource = (urllib.request.urlopen(url).read().decode()) File "/usr/local/lib/python3.6/urllib/request.py", line 223, in urlopen return opener.open(url, data, timeout) File "/usr/local/lib/python3.6/urllib/request.py", line 532, in open response = meth(req, response) File "/usr/local/lib/python3.6/urllib/request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "/usr/local/lib/python3.6/urllib/request.py", line 570, in error return self._call_chain(*args) File "/usr/local/lib/python3.6/urllib/request.py", line 504, in _call_chain result = func(*args) File "/usr/local/lib/python3.6/urllib/request.py", line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

Something about the way the redgifs URL is opened/read/decoded?

Brisppy commented 3 years ago

It seems that redgifs returns a 403 if no user-agent is provided. Place this on line 40 of 'src/downloaders/redgifs.py'. It adds the current chrome user-agent to the request: url = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'})

ymgenesis commented 3 years ago

It seems that redgifs returns a 403 if no user-agent is provided. Place this on line 40 of 'src/redgifs.py'. It adds the current chrome user-agent to the request: url = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'})

github seems to have messed up the tabs/newline/space format of the line to place at 40, and I'm not entirely familiar with Python. I get:

url = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}) ^ TabError: inconsistent use of tabs and spaces in indentation

Do I place it right under line 39 which has url = "https://redgifs.com/watch/" + url.split('/')[-1]. Also I'm assuming you meant src/downloaders/redgifs.py.

EDIT: Figured it out. It didn't like the newlines around the 39-44 lines. Here's the pastebin of my full src/downloaders/redgifs.py file which is currently working for downloading redgifs.

ymgenesis commented 3 years ago

Seems now DIRECT downloads are affected by the urllib change? I'm getting the following on every DIRECT download attempt:

HTTPError: HTTP Error 503: Backend is unhealthy

EDIT: Not happening to me anymore. Must've been a one-time thing.

MrC0D3 commented 3 years ago

Thanks @ymgenesis adding the user agent seems to have done the trick for me. Haven't yet run into that 503 but I'll comment back in here if I encounter it.

ymgenesis commented 3 years ago

Thanks @ymgenesis adding the user agent seems to have done the trick for me. Haven't yet run into that 503 but I'll comment back in here if I encounter it.

Actually fixed itself the next day. Must've been a server-side issue.