apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.
https://crawlee.dev/python/
Apache License 2.0
4.22k stars 294 forks source link

Implement auto-purging of storages #87

Closed B4nan closed 5 months ago

B4nan commented 7 months ago

We need the same bahavior as with the JS version:

Related: https://github.com/apify/apify-cli/issues/545

vdusek commented 5 months ago

Closing as it was resolved in https://github.com/apify/crawlee-py/pull/150