2hands10fingers / Reddit-Image-Scraper-1.0

Scrapes/downloads a selected subreddit's posted images by a specified date range on http://reddit.com
http://www.glotacosm.com
48 stars 13 forks source link

Handling DNS errors #23

Open klimli opened 6 years ago

klimli commented 6 years ago

When url returns DNS_PROBE_FINISHED_NXDOMAIN previous error handling doesn't work. I left "as e:" on line 138 for further debugging process

2hands10fingers commented 6 years ago

@klimli Can I get a little more explanation on this before I merge? I'm not familiar with this error handling

klimli commented 6 years ago

Sure, it's mostly empirical though. Previous method failed when requests.get(url) failed to execute which was the case I encountered when it tried to download image from website that is no longer available. My internet browser when confronted with this url raised DNS_PROBE_FINISHED_NXDOMAIN error. Here you can find different errors in request explained: docs

This discussion on stackoverflow is also helpful.

At first I thought that you will need 2 separate ways (yours original one and this one) to handle all errors but I tried it on real reddit data and it works fine with just this one.

Thanks for the Scraper!

2hands10fingers commented 6 years ago

@klimli Hey there! Thanks for getting back to me with a detailed response. I would work on something a little more like this

try:
    response = request.get(url)
except Exception as e:
    print("An error occurred .... {}".format(e))
    return -1
if response.status_code == 200:

I hope that makes sense. I essentially want to account for all the exceptions. Let me know if there are any questions.

klimli commented 6 years ago

Looks good. Also, I found another error that is not managed properly: https://github.com/2hands10fingers/Reddit-Image-Scraper-1.0/issues/24

Should I remove pull request and you will just add this to the code?