Open prenaissance opened 1 year ago
Is there any workarounds currently? I want to skip saving anything into storage and I'd like to implement my own storage mechanism with this. Is it possible?
Is there any workarounds currently? I want to skip saving anything into storage and I'd like to implement my own storage mechanism with this. Is it possible?
Setting the option persistStorage
to false did the trick for me and there is a corresponding ENV variable for that too.
Which package is the feature request for? If unsure which one to select, leave blank
@crawlee/core
Feature
Add ways to subscribe to crawler lifecycle events. Exporting the scraped data is probably one of the most common actions to do after a crawler runs and adding an idiomatic way to handle that would be great. With this addition, crawlers could be more self contained which would be beneficial to a project with multiple crawlers.
Motivation
Lifecycle hooks could be used using different strategies for exporting data. Ex (one crawler sends e-commerce products in csv to a datalake, another crawler sends e-commerce companies information to a database). Another use case would be to use the for resiliency. Add a handler to send a "starting" message to a temporary storage and a handler for a "finished" message. If the crawler crashes, a retry strategy can be used.
Ideal solution or implementation, and any additional constraints
Add the handlers to the constructor options:
or add events to the crawler
Alternative solutions or implementations
Alternative solution would be to make a wrapper composite class / type, with the crawler and the lifecycle event handlers. Another solution would be to switch to a monorepo and make an app for each crawler.
Other context
No response