Better approach of making a cache

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

https://crawlee.dev/python/

Apache License 2.0

4.05k stars 264 forks source link

Better approach of making a cache #86

Open vdusek opened 6 months ago

vdusek commented 6 months ago

This is a follow-up issue to the discussion in https://github.com/apify/crawlee-py/pull/82#discussion_r1548009445.
Currently, we have our own implementation of LRU cache in crawlee/_utils/lru_cache.py.
Let's do it in a more Pythonic way, maybe utilizing the built-in caching from functools std module (lru_cache decorator)?

Azathoth-X commented 2 days ago

Was reading on this and wanted to clear up somethings. The lru_cache from functools has no delete function for a given key, it just clears whole cache. Can you define what do you mean by pythonic? The cache implementation looks good as it.

Edit: I just saw the usage of this class and del wasn't used atleast in the python codebase. Please do tell me if it is also being used in the ts base. I will try to implement functools LRU . Will update you if there is any progress from my side.

29deepanshutyagi commented 1 day ago

i want to work on this issue ,kindly assign me @B4nan , if it's still opened

janbuchar commented 12 hours ago

i want to work on this issue ,kindly assign me @B4nan , if it's still opened

We don't assign issues for hacktoberfest. If you want to work on this, open a PR. First mergeable one gets merged.