Open AlexS778 opened 9 months ago
Because c.OnResponse
is executed 5 times in the loop, and each time the incoming parameters are added to c.responseCallbacks
in the form of an append, each goroutine executes all the functions in c.responseCallbacks
when it completes the request.
Hello guys, recently I was using crawler to crawl some stuff and it was taking quite a lot of time, so I decided to use async mode. While using the async mode I've noticed a lot of duplicates in my results, especially number of duplicates was matching the number of threads I was launching my crawler.
Here is a quick example, let's take an example from official docs - https://github.com/gocolly/colly/blob/master/_examples/rate_limit/rate_limit.go
If we would launch this code, we can see the results:
A lot of text here with http body response
```json { "args": { "n": "3" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=3" } { "args": { "n": "3" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=3" } { "args": { "n": "3" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=3" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } ```As you can see, there are duplicates in results. Maybe I'm doing something wrong, not setting up crawler properly, but still I highly doubt if this is a intended behaviour. Anyways, would appreciate any help.