Open damms005 opened 7 months ago
Thanks for trying aider and filing this issue.
Re-scraping a webpage should only take a moment, and ensures you have a fresh copy of the data it contains. Persisting or caching the content could lead to problems with not picking up new page content.
Can you help me understand the problem you are having with re-scraping?
Agreed. Although "should only take a moment" when done multiple times a day adds up, esp if not on good connection.
My specific use-case is when I need to use specific features of tools/frameworks like Laravel or Filament. I find myself needing to re-scrape in order to provide context to tasks.
I may also be using the tool wrong, yk 🤷♂️
I wonder if this also fits into the broader RAG feature.
I feel optional caching can definitely help. Most of the pages programmers look at don't change frequently, and having an optional caching with TTL of say 7 days might be helpful in increasing the speed of aider.
I wonder if it's worth just respecting the http cache headers from the server, for most servers that'll be sufficient - be it timed, etag, or other.
This is in furtherance of #400
It will be a good addition to not have to re-scrape same webpage over and over, as it is wasteful etc.
Scraped pages should persist perhaps in a system-wide context such that subsequent calls to
/web http://already-scraped.com/specific-page
will only re-scrape if not already scraped or if user specifically asks to, perhaps a switch to be provided to the/web
command.Many thanks for the awesome job!