HoverHell / RedditImageGrab

Downloads images from sub-reddits of reddit.com.
GNU General Public License v3.0
311 stars 78 forks source link

urllib2.HTTPError: HTTP Error 404: Not Found #16

Open carcinocron opened 10 years ago

carcinocron commented 10 years ago

Fair Warning: my version of the script is modified, and these modifications were my first attempt ever at Python, but I this stack trace is similar enough to #10 that it probably affects the original code, too.

I keep getting this error:

Traceback (most recent call last):
  File "redditdownload.py", line 212, in <module>
    URLS = extract_urls(ITEM['url'])
  File "redditdownload.py", line 137, in extract_urls
    urls = process_imgur_url(url)
  File "redditdownload.py", line 111, in process_imgur_url
    return extract_imgur_album_urls(url)
  File "redditdownload.py", line 29, in extract_imgur_album_urls
    response = urlopen(album_url)
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 410, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

I think this is caused when running the code without the --update tag and the script reaches the absolute last entry in the sub's list of posts. I think it is specifically 404'ing on the URL of the "next page".

Other than that, all the images that I would reasonably expect to have successfully downloaded, seem to be successfully downloading.

carcinocron commented 10 years ago

I could be totally wrong though. My logs don't show anything. While writing this, I realized this:

                    # Download the image
                    download_from_url(URL, FILEPATH)

                    # Image downloaded successfully!
                    print '    Downloaded URL [%s] as [%s].' % (URL.encode('utf-8'), FILENAME.encode('utf-8'))
                    DOWNLOADED += 1
                    FILECOUNT += 1

Which means that a failed download wouldn't get logged anyways, because the URL is logged after the download is successful. I just made the following change:

                    print '    Attempting to Download URL [%s] as [%s].' % (URL.encode('utf-8'), FILENAME.encode('utf-8'))

                    # Download the image
                    download_from_url(URL, FILEPATH)

                    # Image downloaded successfully!
                    print '    Downloaded URL [%s] as [%s].' % (URL.encode('utf-8'), FILENAME.encode('utf-8'))
                    DOWNLOADED += 1
                    FILECOUNT += 1

Now my logs should have the URL stated before the download attempt fails (and all the evidence is lost)

So that hopefully I can follow up with better information

ghost commented 10 years ago

That seems like a reasonable change, I've added it to the script now. Did you get any further tracking down your 404 issue?

emacsomancer commented 9 years ago

I haven't modified the code in any way, and I get this too.

Traceback (most recent call last):
  File "/home/username/Apps/RedditImageGrab/redditdownload.py", line 268, in <module>
    URLS = extract_urls(ITEM['url'])
  File "/home/usernameApps/RedditImageGrab/redditdownload.py", line 197, in extract_urls
    urls = process_deviant_url(url)
  File "/home/username/Apps/RedditImageGrab/redditdownload.py", line 167, in process_deviant_url
    response = urlopen(url)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 437, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 475, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
   urllib2.HTTPError: HTTP Error 403: Forbidden

I got it from running:

python2 redditdownload.py -sfw FractalPorn /home/username/WALLPAPER -score 50