crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development
https://www.crwlr.software/packages/crawler
MIT License
312 stars 11 forks source link

Flexible Auto-Retries for any kind of error responses (4xx, 5xx) #121

Open otsch opened 10 months ago

otsch commented 10 months ago

As discussed in https://github.com/crwlrsoft/crawler/issues/99#issuecomment-1739671602 it would be nice to be able to use the RetryErrorResponseHandler differently. In a way that you're able to configure auto retries for any kind of error response. Not yet sure about the wait times implemented in the RetryErrorResponseHandler. They should probably only be used for the special error responses (429, 503). @ruerdev

ruerdev commented 9 months ago

@otsch Good to know about the RetryErrorResponseHandler, I didn't know that. I think it will be very useful when we have more flexibility in how error responses are handled.

It might be a good idea to let users pick a shorter wait time when they get a 429 error while using proxies. As you will switch to a different IP for their next request.

otsch commented 9 months ago

It might be a good idea to let users pick a shorter wait time when they get a 429 error

You can already customize the wait times, see https://www.crwlr.software/packages/crawler/v1.1/the-crawler/politeness#wait-and-retry I'll think about maybe automatically setting lower default wait times for those two error responses, when calling the new HttpLoader::useRotatingProxies() method 👍🏻