andykais / scrape-pages

generalized scraper using a single instruction set for any site that can be statically scraped
https://scrape-pages.js.org
MIT License
6 stars 2 forks source link

add `limit` flag to ScrapeConfig #9

Closed andykais closed 5 years ago

andykais commented 5 years ago

incrementUntil and limit should work together, and choose the lowest common denominator

andykais commented 5 years ago

Heres the dilemma. Either we put limit on parse and be done with it, or we put the limit on ScrapeConfig and we will have more complicated state to deal with, but also more configuration possibilities. E.g., now we can stop a scrapeNext clause early, or use incrementUntil: 'failed-download' in combo with limit: 100. The downside here is it encourages more separation. E.g., now we may separate a parse and download so that a download may increment and also limit each parse.

Separate from parse seems more frustrating, but also more useful. Especially since we don't have a way to stop a scrapeNext clause right now except when we reach the end of a cycle.

andykais commented 5 years ago

I have a solution, which seems fairly nice to me. It introduces new emitters, and therefore makes the code less declarative, but when dealing with global state, I believe it is wisest to push it out of the project, and not assume what the developer wants to do. Here is the new plan:

andykais commented 5 years ago

add a functional test suite for download caching, limiting