gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.38k stars 1.77k forks source link

fix : retry redirect to AlreadyVisitedUrl will loop error #826

Open Shinku-Chen opened 1 month ago

Shinku-Chen commented 1 month ago

Resolve the issue of retrying redirectable connections, where the link has already been visited, resulting in the error "AlreadyVisitedUrl".

This change cannot avoid the "AlreadyVisitedUrl" error caused by the initial visit, but primarily addresses the "AlreadyVisitedUrl" error caused by retries, reducing the error loop caused by blind retries.

解决retry可以redirect的连接,但该redirect的链接已经被访问,导致返回AlreadyVisitedUrl错误问题 本次改变无法避免初次访问导致的AlreadyVisitedUrl错误,但主要解决retry导致的AlreadyVisitedUrl错误,减少无脑retry导致的错误循环

https://github.com/gocolly/colly/issues/805