go-rod / rod

A Chrome DevTools Protocol driver for web automation and scraping.
https://go-rod.github.io
MIT License
5.08k stars 333 forks source link

Create a lib like colly but only do scheduling and collecting #133

Open ysmood opened 4 years ago

ysmood commented 4 years ago

See https://github.com/gocolly/

Colly is not friendly for dynamic page scraping, such as when their APIs have complicated dependency relationship and you don't want to waste time studying how their API works. Especially when they don't have a doc for their API.

The job of the lib should only focus on scheduling and collecting, like the abstraction of recursion and concurrency control. Low-level network customization is not the priority, because the users who want to use headless technology to solve their problems normally care more about the cost of reverse engineering than performance.

So that any backend can use this lib, not just rod.

ysmood commented 4 years ago

If you are interested in it and want to help us implement it, please leave a message here.

mehul-dev commented 3 years ago

I would definitely like to help you on this let me know further on this.

ysmood commented 3 years ago

@mehul-dev great! we can talk about the details at the chat room, the address is in the readme