-
There are five main callbacks in colly and they are:
1. OnRequest
2. OnError
3. OnResponse
4. OnHTML
5. OnScraped
We want to show reader these callbacks at the very beginning, so how about we …
-
Is it possible to create randomized delays, i.e., per-request delays selected from some range or based on some random factor? I couldn't think of a good way to do this, other than maybe cycling t…
-
When I use colly, I have a case to to iterate context elements when I put something in it with multiple `OnHTML` callback on different html elements.
This is the simple function I wrote.
```
// F…
-
Currently you can specify a URLFIlter to include URL, is there any way to exclude urls ?
-
I found
```
rp, err := proxy.RoundRobinProxySwitcher("socks5://127.0.0.1:1337", "socks5://127.0.0.1:1338")
if err != nil {
log.Fatal(err)
}
c.SetProxyFunc(rp)
```
but if i have ten urls n…
-
[Not really an issue]
Hey mate
I've been using Colly for a small scraping project and I've come across a weird bit of behaviour.
The `e.ChildText()` function returns the text in _all_ of the c…
-
Example code:
```
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a", func(e *colly.HTMLElement) {
link := e.Attr("href")
fmt.Println(link)
c.Visit(e.R…
-
Hey mate!
I'm loving colly so far. I'm new to the Go programming language and I've just been messing around with your scraping library and found a weird bug.
I was just testing out scraping my w…
-
The purpose is to filter out the mismatched urls
for example
package main
import (
"github.com/asciimoo/colly"
"fmt"
"time"
)
func main() {
urls := []string{"https://httpbin.org/hel…
-
For batch craw, the memory will grow very fast, colly become very slow