-
### Which API Provider are you using?
OpenRouter
### Which Model are you using?
Sonnet 3.5
### What happened?
For me the scraping doesnt work really good. I need to copy paste the docs inside to …
-
-
Instead of uploading a document, the user should be able to enter a URL.
Archyve needs to:
- find and scrape the main document content
- ideally show the user a preview of the text scraped from…
-
Issue is to track efforts to improve the web scraping pipeline.
- [ ] Implement Pycookie
- [ ] Implement checks for custom scraper integration (if URL matches a predefined list, use the scraper fo…
-
**Summary**
Currently the generated axtree content for retrieved websites incurs a huge amount of tokens and cost.
Maybe below combination of Playwright with BeautifulSoup can save tokens, cost an…
-
Web scraping in Python can be accomplished using libraries like BeautifulSoup, requests, Scrapy, or Selenium. Here’s an example of web scraping using the most common combination of requests and Beauti…
-
### Issue
First of all, great work with this project! We are running Aider non-interactively through a script and have enabled the setting to "always say yes to every confirmation". Similar to #1522,…
-
I think there is some problem in the code of day 18 web scraping one because i am not been able to do web scraping .
-
Is there a feature in the pipeline to support web scraping functionality - similar to what the LangChain library has to offer (https://python.langchain.com/v0.1/docs/use_cases/web_scraping/).
It is…
-
Refer to notebooks/master_webscraping
Expected design to run on a schedule based on schemes cases master dataset
Outcome
Update the scraped text in dataset
Provide a report for broken links