-
The built-in crawler has no cap on memory usage and for stores with a lot of products it can quickly chew through available memory while populating the full list of URLs to crawl. Need to handle this …
-
### User Story
While we are replicating the bucket data across accounts to allow CaDeT data to be visualised via QS, we also need to replicate the Glue metadata in order to make the data usable and a…
-
[AWS Glue](https://aws.amazon.com/glue/features/) seems really useful especially it's fuzzy FindMatches feature, ([although LLM based cosine similarity embeddings should provide similar features](http…
-
I noticed an issue with Crawl4AI where it initially extracts content from the given links as expected. However, once a link fails, the tool starts crawling the website, which I don’t want. The crawlin…
-
-
Hello!
I have a small request
when you add new crawler/s, is it possible to create a separated file with new instances, something like
```
[
{
"info": "Info",
"created_date": "2024/0…
-
Consider this sample program:
```python
import asyncio
from crawlee.configuration import Configuration
from crawlee.parsel_crawler import ParselCrawler, ParselCrawlingContext
async def de…
-
I am using django-dbbackup with sql_server.pyodbc as my database engine because my database is in MSSQL but it is giving the following error:
```
File "manage.py", line 10, in
execute_from_comma…
-
### feat: Add sitemap and robots.txt for SEO and web crawler management
**Is your feature request related to a problem? Please describe.**
The website currently lacks a sitemap and robots.txt file…
-
# Github Action Related
- [ ] Secrets b4 merging into dev
# K8s Related
- [ ] ConfigMap
# Rabbitmq Related
- [ ] Better Production
# Management Related
- Logging
- [x] Crawler-Schedu…