GateNLP / ultimate-sitemap-parser

Ultimate Website Sitemap Parser
https://mediacloud.org/
Other
182 stars 64 forks source link

Error in request causes total crash #14

Closed bartmachielsen closed 5 years ago

bartmachielsen commented 5 years ago

Because of a time out error on a single sitemap (that does not exist) the entire script crashes. So no other sitemaps are tried and an error is raised:

ERROR [2019-07-26 07:36:22,600 sitemap_scanner: 24] HTTPSConnectionPool(host='dutchitchannel.nl', port=443): Read timed out. (read timeout=60)

pypt commented 5 years ago

Thanks, will take a look (unless you'd like to submit a PR!)

pypt commented 5 years ago

Fixed, 0.5 released. If you're using a custom web client, have a look at the abstract class changes. If not, should work out of the box.

bartmachielsen commented 5 years ago

Thank you!