Open JustinTimperio opened 3 years ago
Try to send SIGQUIT (Ctrl+). Go will dump goroutine stacks traces and exit. Stack traces might help figure out where they got stuck.
^\SIGQUIT: quit
PC=0x46d301 m=0 sigcode=128
goroutine 0 [idle]:
runtime.futex(0xec6450, 0x80, 0x0, 0x0, 0x7ffd00000000, 0x0, 0xadaf50, 0x0, 0x7ffd8e7e4268, 0x40c17f, ...)
/usr/lib/go/src/runtime/sys_linux_amd64.s:579 +0x21
runtime.futexsleep(0xec6450, 0x0, 0xffffffffffffffff)
/usr/lib/go/src/runtime/os_linux.go:44 +0x46
runtime.notesleep(0xec6450)
/usr/lib/go/src/runtime/lock_futex.go:159 +0x9f
runtime.mPark()
/usr/lib/go/src/runtime/proc.go:1340 +0x39
runtime.stopm()
/usr/lib/go/src/runtime/proc.go:2301 +0x92
runtime.findrunnable(0xc000033000, 0x0)
/usr/lib/go/src/runtime/proc.go:2960 +0x72e
runtime.schedule()
/usr/lib/go/src/runtime/proc.go:3169 +0x2d7
runtime.goexit0(0xc004011980)
/usr/lib/go/src/runtime/proc.go:3478 +0x1de
runtime.mcall(0x7f67cd18cce7)
/usr/lib/go/src/runtime/asm_amd64.s:327 +0x5b
goroutine 1 [semacquire, 2 minutes]:
sync.runtime_Semacquire(0xc00035c108)
/usr/lib/go/src/runtime/sema.go:56 +0x45
sync.(*WaitGroup).Wait(0xc00035c100)
/usr/lib/go/src/sync/waitgroup.go:130 +0x65
github.com/gocolly/colly/v2.(*Collector).Wait(...)
/home/conductor/go/pkg/mod/github.com/gocolly/colly/v2@v2.1.0/colly.go:812
plHttpCrawler/crawler.RunJob(0xc00035ced0, 0xc, 0x1f48, 0x58, 0xfa, 0x2710, 0xc00035cf18, 0x4, 0x1, 0x61280f41, ...)
/home/conductor/plHttpCrawler/crawler/crawl.go:201 +0x56a
plHttpCrawler/crawler.Crawl(0xc00035ced0, 0xc, 0x1f48, 0x58, 0xfa, 0x2710, 0xc00035cf18, 0x4, 0x1, 0x61280f41, ...)
/home/conductor/plHttpCrawler/crawler/crawl.go:37 +0x5d8
main.main()
/home/conductor/plHttpCrawler/.BUILD_SOURCE/plHttpCrawler.go:40 +0x878
goroutine 849 [IO wait, 2 minutes]:
internal/poll.runtime_pollWait(0x7f67a647a5a0, 0x72, 0xffffffffffffffff)
/usr/lib/go/src/runtime/netpoll.go:222 +0x55
internal/poll.(*pollDesc).wait(0xc0035e2698, 0x72, 0x200, 0x200, 0xffffffffffffffff)
/usr/lib/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
/usr/lib/go/src/internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0035e2680, 0xc00451c000, 0x200, 0x200, 0x0, 0x0, 0x0)
/usr/lib/go/src/internal/poll/fd_unix.go:166 +0x1d5
net.(*netFD).Read(0xc0035e2680, 0xc00451c000, 0x200, 0x200, 0x7f67a6149ef0, 0xc000918e00, 0x0)
/usr/lib/go/src/net/fd_posix.go:55 +0x4f
net.(*conn).Read(0xc00021e910, 0xc00451c000, 0x200, 0x200, 0x0, 0x0, 0x0)
/usr/lib/go/src/net/net.go:183 +0x91
net/http.(*readWriteCloserBody).Read(0xc000535890, 0xc00451c000, 0x200, 0x200, 0xc00451a035, 0xc00451c000, 0x7f67a6149ef0)
/usr/lib/go/src/net/http/transport.go:2371 +0x102
net/http.(*cancelTimerBody).Read(0xc001384580, 0xc00451c000, 0x200, 0x200, 0x972700, 0x9c8101, 0xc00451c000)
/usr/lib/go/src/net/http/client.go:939 +0x62
io.(*LimitedReader).Read(0xc0005358a8, 0xc00451c000, 0x200, 0x200, 0xc0005358a8, 0x0, 0xc000040400)
/usr/lib/go/src/io/io.go:473 +0x63
io.ReadAll(0xae8480, 0xc0005358a8, 0x0, 0x0, 0xffffffffffffffff, 0xc000e7b3e0, 0x0)
/usr/lib/go/src/io/io.go:633 +0xdf
io/ioutil.ReadAll(...)
/usr/lib/go/src/io/ioutil/ioutil.go:27
github.com/gocolly/colly/v2.(*httpBackend).Do(0xc000071440, 0xc000918e00, 0x2540be400, 0xc0013fff38, 0x0, 0x0, 0x0)
/home/conductor/go/pkg/mod/github.com/gocolly/colly/v2@v2.1.0/http_backend.go:209 +0x349
github.com/gocolly/colly/v2.(*httpBackend).Cache(0xc000071440, 0xc000918e00, 0x2540be400, 0xc001138f38, 0x0, 0x0, 0x0, 0x0, 0x0)
/home/conductor/go/pkg/mod/github.com/gocolly/colly/v2@v2.1.0/http_backend.go:134 +0x77
github.com/gocolly/colly/v2.(*Collector).fetch(0xc0000f6700, 0xc0000292e0, 0x19, 0xa40e5d, 0x3, 0x1, 0x0, 0x0, 0xc000ad2a50, 0xc000924210, ...)
/home/conductor/go/pkg/mod/github.com/gocolly/colly/v2@v2.1.0/colly.go:653 +0x2b3
created by github.com/gocolly/colly/v2.(*Collector).scrape
/home/conductor/go/pkg/mod/github.com/gocolly/colly/v2@v2.1.0/colly.go:574 +0x3e5
rax 0xca
rbx 0xec6300
rcx 0x46d303
rdx 0x0
rdi 0xec6450
rsi 0x80
rbp 0x7ffd8e7e4230
rsp 0x7ffd8e7e41e8
r8 0x0
r9 0x0
r10 0x0
r11 0x286
r12 0x0
r13 0x1c65
r14 0xadaf50
r15 0x0
rip 0x46d301
rflags 0x286
cs 0x33
fs 0x0
gs 0x0
I've created the following repository to show the bug in action: https://github.com/Project-Lernaean/CollyTimeoutError
To reproduce use go run main.go
and it will iterate through a list of 100 IP port combos. It will always endlessly hang on at least one IP.
My best guess is this is related to https://github.com/golang/go/issues/27752#issuecomment-492677602 The Dialer.Deadline
should be setting the connection deadline, but sometimes doesn't work. If you reduce the number of urls sometimes they all work, sometimes they do not. Maybe reducing the testcase to remove Colly and demonstrate the problem with http.Transport
will help get to the underlying issue.
@dherbst Thanks for the advice. I had a feeling this was lower level than colly but I was hoping that it wasn't an issue with the HTTP
package. I can modify my bug demonstration to not use colly and submit it to the Golang repo but I'll do some more testing before then.
@JustinTimperio I rewrote your test without colly and I can reproduce the hanging issue with a subset of your url list here: https://github.com/dherbst/deadlinetest
Awesome work @dherbst! I can sumbit this to the golang source repo with a full explanation of what's going on with the bug later this afternoon
Then c.SetRequestTimeout()
should probably do what this comment says right? I assume it doesn't currently, as then @JustinTimperio wouldn't have had the issue.
net.Dialer.Timeout and net.Dialer.Deadline are indeed for the controlling the deadline of the dial operation only. Setting a request deadline with http.NewRequestWithContext should work in all circumstances, including when reads or writes to the underlying connection are slow. If that doesn't work, we have a bug, but that does not seem to be the case here.
But Collector.SetRequestTimeout
sets net/http.Client
's Timeout
, which looks correct.
And thus repro in github.com/dherbst/deadlinetest is wrong, as it sets the dialer's timeout.
I see that now, thanks. Fixed the test.
For me https://github.com/dherbst/deadlinetest doesn't hang. Also https://github.com/Project-Lernaean/CollyTimeoutError doesn't hang (even when using urls.txt.bak
). Also, https://github.com/Project-Lernaean/CollyTimeoutError doesn't actually use SetRequestTimeout
anywhere in the code.
For me https://github.com/dherbst/deadlinetest doesn't hang.
It will not hang any more, I fixed the test.
It was originally hanging because I based it off the CollyTimeoutError repo, which omitted the timeout. I haven't looked at this code in a couple of years, but if I remember correctly the issue originally reported is fixed by setting the request timeout as I have in my test. This issue should probably be closed.
And yet the original report had c.SetRequestTimeout
, which has always set net/http.Client
's Timeout
.
Something isn't right, but I'm inclining to close this until some concrete reproduction arrives.
I can confirm that setting the Deadline
in the @dherbst's example solves the issue. So in the op example there should be no hanging requests, as seen below. My suspicion is that its hanging because it cannot dial out and the lack of a deadline leaves it open.
c := colly.NewCollector(
colly.MaxBodySize(10e9),
colly.DetectCharset(),
colly.Async(true),
)
c.Limit(&colly.LimitRule{DomainGlob: "*", Parallelism: 500})
c.SetRequestTimeout(5 * time.Second)
c.WithTransport(&http.Transport{
Proxy: http.ProxyFromEnvironment,
DialContext: (&net.Dialer{
Timeout: 5 * time.Second,
KeepAlive: 5 * time.Second,
Deadline: 5 * time.Second, // <---- add deadline
DualStack: true,
}).DialContext,
MaxIdleConns: 500,
IdleConnTimeout: 5 * time.Second,
TLSHandshakeTimeout: 5 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
})
Edit: in my test environment I was able to process a significant quantity of links, but eventually ran into a hung process so maybe this is solve some situations, but not all.
Hi all, I am part of a research project called "Project-Lernaean". We are working on mapping and surveying the internet for both developers and security researchers. We have been using Colly quite successfully in our scans but recently I have run into a fairly obscure issue. My current guess is that honeypots create endless pages that are deadlocking a few jobs infinitely. I have set very aggressive timeouts which you can see below, but even still typically in a job of 10k IP's 4-15 will fully deadlock and
c.Wait()
never completes.I would appreciate any or guidance that the community has on dealing with this issue. I am also happy to create a test repository that can duplicate this issue. If possible adding a
c.ForceClose()
function would be extremely helpful in the case that workers for whatever reason, are unable to terminate.