alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.24k stars 654 forks source link

Add support for incremental learning #18

Closed Narasimha1997 closed 4 years ago

Narasimha1997 commented 4 years ago

As of now, the rules are formed at once based on the targets specified in wanted_list and the stack list is generated for those targets. Sometimes there will be scenarios where I have to update the existing stack list with new rules learnt from different set of targets on the same URL. As seen in the build method, you create a new stack list every time a build method is called. Provide an update method, that updates the stack list simply by appending the new rules learnt from new set of targets. This will be very useful functionality because it will allow developers to incrementally add new targets by retaining the older rules.

alirezamika commented 4 years ago

Sounds like a good idea. I think it can also be from different urls. Suppose you want to build a price scraper, you can feed it with samples of different websites, like amazon, ebay, etc.

Narasimha1997 commented 4 years ago

Yes! that would be good. This will be like a Scraping marketplace. Learn rules -> Increment the learning (add more rules from different sources) -> Publish the rules so others can consume it. Just like TensorflowHub and ModelZoo for Deeplearning, haha

alirezamika commented 4 years ago

Awesome!

alirezamika commented 4 years ago

19