Closed aimuz closed 6 years ago
I initially thought it was a URL error that caused this problem, but when I switched to www.github.com. Running is still like this
Swap c.Wait
and c.Visit
in the scraper. You have to wait after an async visit.
My initial code is like that, but it still causes this problem.
package main
import (
"github.com/gocolly/colly"
"log"
"go_spider/bolt_storage"
)
func main() {
c := colly.NewCollector(
colly.Async(true),
)
boltStorage := &bolt_storage.Storage{
Path: "db.db",
Mode: 0777,
BucketName: []byte("colly"),
Prefix: "spider_",
Options: nil,
}
err := c.SetStorage(boltStorage)
if err != nil {
log.Panic("err:", err)
panic(err)
}
c.OnRequest(func(request *colly.Request) {
log.Println("request:", request.URL.String())
})
c.OnResponse(func(r *colly.Response) {
log.Print(r.Request.URL.String())
})
c.Visit("https://api.1sapp.com/wemedia/content/articleList?dtu=200&id=700603&page=1")
c.Wait()
}
This does not solve the problem
I tried to find the problem by Debug, but it didn't work.
Can you reproduce the bug without your custom storage (e.g. with the default in-memory storage)?
I did not use this problem when I used the default storage.
Then, your implementation contains a bug. Try to write tests to identify the root cause.
I started thinking it was a custom storage issue, but it seems that there are no exceptions thrown. https://github.com/aimuz/colly-bolt-storage
Can you specify the bug? 😀
Sorry, I found the problem, IsVisited
returned the value, thx😜
Run this code, end directly, no errors, no errors
But when I remove this code, it can run normally.
bolt.go
Why is that? Run without any errors