antchfx / antch

Antch, a fast, powerful and extensible web crawling & scraping framework for Go
MIT License
260 stars 41 forks source link

antch-getstarted demo looping forever #7

Closed dlipiec closed 6 years ago

dlipiec commented 6 years ago

Hi, This is my first post to github - sorry if I did something wrong. When running antch-getstarted on my Win 7 (64bit) + golang 1.9, program outputs several json records and pausing/looping. Only Ctrl+c can terminate it. I'm new to golang, so I was not able to solve this issue internally. Thanks,

zhengchun commented 6 years ago

hi @dlipiec ,what is your antch version(means commit), you need get latest version of antch. if you already get latest version, and want to close or stop program you can try this:

c := make(chan struct{})
crawler := &antch.Crawler{
    Exit: c,
}
go func() {
    time.Sleep(1 * time.Second)
    close(c)
}()
<-crawler.Exit
fmt.Println("exit program")

like about example, you can declare a c channel and assign to crawler.Exit, if you want to stop jsut close c.

dlipiec commented 6 years ago

I assume, I got "antch" in latest version, as I did "go get ..." today. I did "dep ensure" also. Could you provide working (compilable) version of main() function from getstarted -- as a novice in golang, I'm not able to incorporate your proposals, to stop looping. thanx

W dniu 2018-01-02 o 13:43, zhengchun pisze:

hi @dlipiechttps://github.com/dlipiec ,what is your antch version(means commit), you need get latest version of antch. if you already get latest version, and want to close or stop program you can try this:

c := make(chan struct{}) crawler := &antch.Crawler{ Exit: c, } go func() { time.Sleep(1 * time.Second) close(c) }() <-crawler.Exit fmt.Println("exit program")

like about example, you can declare a c channel and assign to crawler.Exit, if you want to stop jsut close c.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/antchfx/antch/issues/7#issuecomment-354758974, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhZ41I_6qeTXwX6N03s8jku8eebRsqUkks5tGiRigaJpZM4RQbgE.

zhengchun commented 6 years ago

if you want to stop exit, just close Crawler.Exit by close(crawler.Exit).

The getstarted project is console program, if you run go run main.go and only Ctrl+c to exit, it is not bug.

Antch as a web crawler framework, you decide to what time to exit program.

the get-started example downloads two pages(http://dmoztools.net/Computers/Programming/Languages/Python/Books/ and http://dmoztools.net/Computers/Programming/Languages/Python/Resources/) and extract data from these page and then stop cralwer working to wait new pages still you put new URL by crawler.StartURLs("http://example.com").

main.go

startURLs := []string{
    "http://dmoztools.net/Computers/Programming/Languages/Python/Books/",
    "http://dmoztools.net/Computers/Programming/Languages/Python/Resources/",
}
crawler.StartURLs(startURLs)
<-crawler.Exit # here,waiting you close program, only CTRL+C to exit in here.
zhengchun commented 6 years ago

sorry, Im forgot dep ensure command may be not get newest version of antch. you can running go get -u github.com/antchfx/antch to update your antch src and then run go run main.go.

dlipiec commented 6 years ago

Thanks for explanations. However, adding below line to main.go:

close(crawler.Exit)

causes compilation error: invalid operation: close(crawler.Exit) (cannot close receive-only channel) My intension is to use "antch" in a batch script. thanx

W dniu 2018-01-02 o 14:59, zhengchun pisze:

if you want to stop exit, just close Crawler.Exit by close(crawler.Exit).

The getstarted project is console program, if you run go run main.go and only Ctrl+c to exit, it is not bug.

Antch as a web crawler framework, you decide to what time to exit program.

the get-started example downloads two pages(http://dmoztools.net/Computers/Programming/Languages/Python/Books/ and http://dmoztools.net/Computers/Programming/Languages/Python/Resources/) and extract data from these page and then stop cralwer working to wait new pages still you put new URL by crawler.StartURLs("http://example.com"http://example.com).

main.go

startURLs := []string{ "http://dmoztools.net/Computers/Programming/Languages/Python/Books/", "http://dmoztools.net/Computers/Programming/Languages/Python/Resources/", } crawler.StartURLs(startURLs) <-crawler.Exit # here,waiting you close program, only CTRL+C to exit in here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/antchfx/antch/issues/7#issuecomment-354771986, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhZ41DXPe1iThtef5hV93wHI1BeluSs5ks5tGjZLgaJpZM4RQbgE.

zhengchun commented 6 years ago

crawler.Exit is a receive-only channel, you cannot immediate close it. declare a new channel and assign to crawler.Exit.

crawler := antch.NewCrawler() replace by c := make(chan struct{}) crawler := &antch.Crawler{Exit: c} and then close c

main.go

c := make(chan struct{})
crawler := &antch.Crawler{
    Exit: c,
}
startURLs := []string{
    "http://dmoztools.net/Computers/Programming/Languages/Python/Books/",
    "http://dmoztools.net/Computers/Programming/Languages/Python/Resources/",
}
crawler.StartURLs(startURLs)
go close(c) // here, close c that will notify the crawler to exit program
<-crawler.Exit
zhengchun commented 6 years ago

See: https://github.com/antchfx/antch-getstarted/commit/753c0f05bba294668cdac700851dd447bd8b7e6a