-
Run this code, end directly, no errors, no errors
But when I remove this code, it can run normally.
```
package main
import (
"github.com/gocolly/colly"
"log"
"go_spider/bolt_storage…
aimuz updated
6 years ago
-
I found a strange case where setting the user agent causes the crawl to not work but `curl` gets the site response fine with the same user agent. I provided example code below which you can run with a…
-
Quick question.
I need to scrape documentation from web sites. After I have grabbed the documentation from their web site I need to run jobs to go back everyday and only get anything new.
Do you…
ghost updated
6 years ago
-
Hi,
According to [colly documentation](http://go-colly.org/docs/best_practices/distributed/#distributed-scrapers) distributed scrapers
```
the best you can do is wrapping the scraper in a server. …
-
I am passing information between collectors and setting scraped data using the context. Actually i do something like ctx.Put("collector.From",X) and ctx.Put("object.Name",X). This adds some overheat t…
-
Hi, thank you for building such a powerful library!
I was testing Colly with various websites and it seems colon separated URLs are sometimes ignored, especially on paginated URLs.
For example, …
-
This would essentially replicate the behavior of the old bot. Which scrapes google for gifs. I know it's not best practice to scrape the web however I believe this has the best results for getting GIF…
-
I'm having a slight issue with app engine and wondering whether I've missed some config stage to get colly working correctly. Sorry if this is the wrong place to be posting this.
I created my code …
-
```go
package main
import (
"fmt"
"github.com/gocolly/colly"
"github.com/gocolly/colly/debug"
"time"
)
func main() {
c := colly.NewCollector(
colly.UserAgent("Mozilla/5.0 (Windo…
-
I'm relatively new to golang so while I realize the url parsing is done using golang's core libraries, I have still found an issue that may be valuable to solve in a crawling project.
If you look a…