Closed ranisalt closed 2 years ago
Can't reproduce. Although it prints OnRequest
for all URLs at once, requests on wire are properly limited (you can verify it with tcpdump).
package main
import (
"fmt"
"log"
"github.com/gocolly/colly/v2"
)
func main() {
c := colly.NewCollector(colly.Async(true))
c.Limit(&colly.LimitRule{DomainGlob: "*", Parallelism: 2})
c.OnRequest(func(r *colly.Request) {
log.Printf("OnRequest %s", r.URL.String())
})
c.OnResponse(func(r *colly.Response) {
log.Printf("OnResponse %s", r.Request.URL.String())
})
for i := 0; i < 8; i++ {
c.Visit(fmt.Sprintf("http://httpbin.org/delay/2?%d", i))
}
c.Wait()
}
Can't reproduce. Although it prints
OnRequest
for all URLs at once, requests on wire are properly limited (you can verify it with tcpdump).
I will try that again, it's been a while. I was accidentally DoSing the website every time :laughing:
You might want to add Delay
as well, though, because otherwise request rate would be only limited by server response rate. Maybe that was the reason of your DoS to begin with?
Although it prints OnRequest for all URLs at once, requests on wire are properly limited
Looks like a very important point.
I am scraping thousands of pages from a website, which is painfully slow to do in sequence. So, I'm trying to use async mode. However, if I fire too many requests at once (and it's not that many), the service simple crashes and takes a while to restart. So I referred to the rate limit documentation. Unfortunately, that simply does nothing.
Considering the following code:
I expected at most 4 requests to fire at once, however I see
Visiting <url>
for all URLs almost instantly, so it is clearly firing every request at once, and this is confirmed by checking that the server I'm visiting is indeed down.It does not work even if I set
Parallelism
to 1, which has the very same outcome. And before someone asks, no, other values ofDomainGlob
such as*domain.com*
don't work either.