add `limit` flag to ScrapeConfig

andykais commented 5 years ago

incrementUntil and limit should work together, and choose the lowest common denominator

andykais commented 5 years ago

Heres the dilemma. Either we put limit on parse and be done with it, or we put the limit on ScrapeConfig and we will have more complicated state to deal with, but also more configuration possibilities. E.g., now we can stop a scrapeNext clause early, or use incrementUntil: 'failed-download' in combo with limit: 100. The downside here is it encourages more separation. E.g., now we may separate a parse and download so that a download may increment and also limit each parse.

Separate from parse seems more frustrating, but also more useful. Especially since we don't have a way to stop a scrapeNext clause right now except when we reach the end of a cycle.

andykais commented 5 years ago

I have a solution, which seems fairly nice to me. It introduces new emitters, and therefore makes the code less declarative, but when dealing with global state, I believe it is wisest to push it out of the project, and not assume what the developer wants to do. Here is the new plan:

add limit flag to parse config, it is ran each time a scraper-step is ran, no global state
add emit('stop:<scraper>') ability to emit function.
- IF a developer so desires, they may track the number of items to come out of a specific on('<scraper>:complete') and stop the scraper after a certain period. This will not stop the whole scraper from running, but will prevent that specific scraper from downloading/parsing anymore values. Think of it as closing a pipe valve, it will stop any scrapers below it from receiving new values

andykais commented 5 years ago

add a functional test suite for download caching, limiting

andykais / scrape-pages

add `limit` flag to ScrapeConfig #9