Open JDRanpariya opened 3 years ago
You could add OnHTML("html", ...
handler that would do e.DOM.Find(...)
with a concrete selector which would depend on the site you're currently processing. Not ideal, but still better than instantiating different collectors, I think.
Also does colly has concept of data pipeline like we have in python scrapy?
I don't think so.
It's worth trying. I feel like I may have to add try catch or colly will just ignore if onHTML()'s specific selector is not found?
Also Is there any feature update which you guys are working on to get data pipeline?
I feel like I may have to add try catch or colly will just ignore if onHTML()'s specific selector is not found?
"No matching elements" is not an error here.
Also Is there any feature update which you guys are working on to get data pipeline?
I don't think so. Colly leans more to the crawler framework side rather than scraping library.
Question: I've to scrape different 10+ blogs for articles. I've to scrape fields like title, author, likes, content etc. but each site would have different css selector for the fields I've motioned before So how would I incorporate this in one colly instance. for example I can crawl all sites using c.Visit(site) and getting results for all sites but how do I write separate parsing pipeline for each site? Also does colly has concept of data pipeline like we have in python scrapy?