gocolly / colly

Elegant Scraper and Crawler Framework for Golang
https://go-colly.org/
Apache License 2.0
23.2k stars 1.76k forks source link

How would I pass a HTMLElement to a Collector like .visit()? #683

Open winston-bosan opened 2 years ago

winston-bosan commented 2 years ago

TLDR: How would I pass a HTMLElement to a Collector? Instead of asking it to visit a URL?

cases.OnHTML("div.col-result", func(e *colly.HTMLElement) {
    log.Println("Case result found:", e)
    // This is something I want! 
    case_detail_link.pass_html_to_collection(e)
})

//case_detail_link is a collector
case_detail_link.OnHTML("a.cta-primary", func(e *colly.HTMLElement) {
    log.Println(e.Attr("href"))
})

"Why do you want it this way? Why not just...": Hey I totally get it. I could simply just have the 2nd collector logic embedded in the first cases.OnHTML call. However, because I am toying around with the idea of doing building JSON config files in real time into colly-based executables, I think this is a behavior I need to make it work.

WGH- commented 2 years ago
cases.OnHTML("div.col-result", func(e *colly.HTMLElement) {
    log.Println("Case result found:", e)
    e.ForEach("a.cta-primary", func(_ int, e *colly.HTMLElement) {
        log.Println(e.Attr("href"))
    })
})

is this what you're trying to do?

winston-bosan commented 2 years ago

cases.OnHTML("div.col-result", func(e *colly.HTMLElement) {

  log.Println("Case result found:", e)

  e.ForEach("a.cta-primary", func(_ int, e *colly.HTMLElement) {

      log.Println(e.Attr("href"))

  })

})

is this what you're trying to do?

Yes it is. I am grateful for your time to write it out. However, as I said in the "why not..." section, because I have a specific agenda in mind, I wonder if it is possible to feed the HTML to another collector.