Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
every async operation checks if it was the first call, and purges automatically unless opted-out via CRAWLEE_PURGE_ON_START env var (with a falsy value like 0 or false)
since the SDK uses those storage classes, it has the same behavior out of box
internally this works by calling purge method on the storage client, so this also means both memory storage and apify client need to implement this purge method
We need the same bahavior as with the JS version:
CRAWLEE_PURGE_ON_START
env var (with a falsy value like0
orfalse
)open
orgetInput
https://crawlee.dev/api/core/function/purgeDefaultStoragespurge
method on the storage client, so this also means both memory storage and apify client need to implement thispurge
methodRelated: https://github.com/apify/apify-cli/issues/545