gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.2k stars 1.76k forks source link

Adding backoff strategies #487

Open i25959341 opened 4 years ago

i25959341 commented 4 years ago
asciimoo commented 4 years ago

Could you explain it?

i25959341 commented 4 years ago

I currently use something like this https://github.com/jpillora/backoff to do back off with gocolly, was wondering if we can incorporate this into the library as it is quite valuable to our users

WGH- commented 4 years ago

Doing retries is somewhat awkward in Colly, to be honest. Unless I'm missing something simple, you have to save some retry state in the context, retrieve and update it in OnRequest and OnError, sleeping if necessary. Everything manually.

I think most crawlers should do retries and backoff, so it would make sense if it was built in.

asciimoo commented 4 years ago

Good points, could you work on this?

WGH- commented 4 years ago

Perhaps. I can't give any ETA yet, though.

i25959341 commented 4 years ago

a problem with retry is that it uses the same proxy, ideally I would want to use another proxy if I have to retry

i25959341 commented 4 years ago

If you have to implement it, what would the solution look like?