Closed andykais closed 5 years ago
Heres the dilemma. Either we put limit
on parse
and be done with it, or we put the limit on ScrapeConfig
and we will have more complicated state to deal with, but also more configuration possibilities. E.g., now we can stop a scrapeNext
clause early, or use incrementUntil: 'failed-download'
in combo with limit: 100
. The downside here is it encourages more separation. E.g., now we may separate a parse and download so that a download may increment and also limit each parse.
Separate from parse seems more frustrating, but also more useful. Especially since we don't have a way to stop a scrapeNext
clause right now except when we reach the end of a cycle.
I have a solution, which seems fairly nice to me. It introduces new emitters, and therefore makes the code less declarative, but when dealing with global state, I believe it is wisest to push it out of the project, and not assume what the developer wants to do. Here is the new plan:
limit
flag to parse
config, it is ran each time a scraper-step is ran, no global stateemit('stop:<scraper>')
ability to emit
function.
on('<scraper>:complete')
and stop the scraper after a certain period. This will not stop the whole scraper from running, but will prevent that specific scraper from downloading/parsing anymore values. Think of it as closing a pipe valve, it will stop any scrapers below it from receiving new valuesadd a functional test suite for download caching, limiting
incrementUntil
andlimit
should work together, and choose the lowest common denominator