antchfx / antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go
MIT License
259 stars 41 forks source link

Please show another Exit Method ,ths #11

Open liuzeng01 opened 4 years ago

liuzeng01 commented 4 years ago

I had read the example spider, it used the signal chan as the exit method .But it's not very helpfully. Can you show more exit method ? For example , if the spider having done all the scrapy works ,it would exit automatically ?

zhengchun commented 4 years ago

func main() {
    startURLs := []string{
        "http://dmoztools.net/Computers/Programming/Languages/Python/Books/",
        "http://dmoztools.net/Computers/Programming/Languages/Python/Resources/",
    }

    quitCh := make(chan struct{})
    crawler := &antch.Crawler{Exit: quitCh}
    spdier := antch.HandlerFunc(func(c chan<- antch.Item, _ *http.Response) {
        c <- nil
    })
    crawler.Handle("*", spdier)

    works := 0
    crawler.UsePipeline(func(next antch.PipelineHandler) antch.PipelineHandler {
        return antch.PipelineHandlerFunc(func(v antch.Item) {
            works++
            if works == len(startURLs) {
                close(quitCh)
            }
        })
    })

    go func() {
        crawler.StartURLs(startURLs)
    }()
    <-crawler.Exit
    fmt.Println("all done")
}