gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.2k stars 1.76k forks source link

Early cancellation of Queue.Run #697

Open arthurxxy opened 2 years ago

arthurxxy commented 2 years ago

the example:

func main() {
    url := "https://httpbin.org/delay/1"

    // Instantiate default collector
    c := colly.NewCollector(colly.AllowURLRevisit())

    // create a request queue with 2 consumer threads
    q, _ := queue.New(
        2, // Number of consumer threads
        &queue.InMemoryQueueStorage{MaxSize: 10000}, // Use default queue storage
    )

    c.OnRequest(func(r *colly.Request) {
        fmt.Println("visiting", r.URL)
        if r.ID < 15 {
            r2, err := r.New("GET", fmt.Sprintf("%s?x=%v", url, r.ID), nil)
            if err == nil {
                q.AddRequest(r2)
            }
        }
    })

    for i := 0; i < 5; i++ {
        // Add URLs to the queue
        q.AddURL(fmt.Sprintf("%s?n=%d", url, i))
    }
    // Consume URLs
    q.Run(c)

}

Question: 1, how to stop the queue when error appear.I try it is not working.

c.OnError(func(r *colly.Response, err error) {
          q.Stop()
}
WGH- commented 2 years ago

Well, this's is indeed a valid limitation if Queue API as it is now.