Closed papa-stiflera closed 7 years ago
I can't reproduce the bug, the above code runs without errors. What is your environment (os/go version/colly version)?
$ go version
go version go1.9.1 linux/amd64
colly version: d7069d1f89470f36a4176fd962fa3e40a838cb9b (master)
hmm.. I can reproduce if I add the -race
flag to go run
command, but I don't really understand why this happens.
@papa-stiflera you didn't paste the whole code you're using in order to be able to re-produce this. At the end of your logs: /home/skruglov/Projects/go/src/crawler/main.go:19 +0xe4
but the code you gave us is not too long( maybe of the import statements? I don't know but...) so I assume your data race is somewhere else and has nothing to do with the net/http package or the colly one. Please give us the necessary information to help you, thank you!
@kataras thanks for the debugging. My first assumption was the same, then I tried the above code, and the bug is reproducible - even with the below snippet:
package main
import (
"github.com/asciimoo/colly"
)
func main() {
colly.NewCollector().Visit("https://en.wikipedia.org/")
}
The strange thing is that if I change the URL to a service which doesn't support HTTP/2 (e.g. to https://httpbin.org/
) the race disappears.
UPDATE:
It's pretty sure that the bug is somehow connected to the HTTP/2 support;
This command doesn't fail:
GODEBUG='http2client=0' go run -race t/test_36.go
and this fails:
GODEBUG='http2client=1' go run -race t/test_36.go
@asciimoo It's funny because the data racer couldn't find that with the first try, I had to re-run the program more than 4 times to view the race log...
Update:
I managed to "fix" that by locking when client.Do
, see below and test it by yourself, if that works just put locks there;
@kataras unfortunately your suggested solution is not applicable, because it forbids parallelism in httpBackend
and the error doesn't disappear for me if I run GODEBUG='http2client=1' go run -race xy.go
.
I know @asciimoo ...I suggested it as a temporary solution, I don't know the whole code base so I can't help any further for now, but if it's a net/http issue then you have to fill an issue there :/
The bug is reproducible without colly. The following code demonstrates the problem:
package main
import (
"net/http"
"net/http/cookiejar"
)
func main() {
jar, _ := cookiejar.New(nil)
client := &http.Client{Jar: jar}
client.CheckRedirect = func(req *http.Request, via []*http.Request) error {
lastRequest := via[len(via)-1]
req.Header = lastRequest.Header
return nil
}
client.Get("https://en.wikipedia.org/")
}
Seems, the bug only appears if the client has a cookie jar and a custom redirect handler which writes to http.Request.Header
using HTTP/2 protocol.
Example code:
Execution log: