crawler-management Search Results

nexcess/magento-turpentine #145

Crawler Memory Management

The built-in crawler has no cap on memory usage and for stores with a lot of products it can quickly chew through available memory while populating the full list of URLs to crawl. Need to handle this …

aheadley updated 11 years ago

ministryofjustice/analytical-platform #5859

🕸 Implement a Glue Crawler to Recreate CaDeT Metadata from R…

### User Story While we are replicating the bucket data across accounts to allow CaDeT data to be visualised via QS, we also need to replicate the Glue metadata in order to make the data usable and a…

julialawrence updated 3 days ago

nehanims/notes #52

AWS Glue open source alternatives

[AWS Glue](https://aws.amazon.com/glue/features/) seems really useful especially it's fuzzy FindMatches feature, ([although LLM based cosine similarity embeddings should provide similar features](http…

nehanims updated 1 month ago

unclecode/crawl4ai #237

Prevent Crawl4AI from Crawling After Link Failure – Only Ext…

I noticed an issue with Crawl4AI where it initially extracts content from the given links as expected. However, once a link fails, the tool starts crawling the website, which I don’t want. The crawlin…

Pranshu172 updated 2 days ago

data-skeptic/open-house-crawler #6

Better configuration management for crawler

kylepolich updated 8 years ago

monperrus/crawler-user-agents #366

Request for new instances

Hello! I have a small request when you add new crawler/s, is it possible to create a separated file with new instances, something like ``` [ { "info": "Info", "created_date": "2024/0…

petrospap updated 1 month ago

apify/crawlee-python #539

Crawler doesn't respect `configuration` argument

Consider this sample program: ```python import asyncio from crawlee.configuration import Configuration from crawlee.parsel_crawler import ParselCrawler, ParselCrawlingContext async def de…

tlinhart updated 5 days ago

jazzband/django-dbbackup #175

Unknown db engine with SQL server

I am using django-dbbackup with sql_server.pyodbc as my database engine because my database is in MSSQL but it is giving the following error: ``` File "manage.py", line 10, in execute_from_comma…

RamizSami updated 3 years ago

ffxiv-teamcraft/ffxiv-teamcraft #2816

feat: Add sitemap and robots.txt for SEO and web crawler man…

### feat: Add sitemap and robots.txt for SEO and web crawler management **Is your feature request related to a problem? Please describe.** The website currently lacks a sitemap and robots.txt file…

cohenaj194 updated 5 months ago

Elven9/TSMC-2022-CloudNative-Final #8

fix: better implementation style

# Github Action Related - [ ] Secrets b4 merging into dev # K8s Related - [ ] ConfigMap # Rabbitmq Related - [ ] Better Production # Management Related - Logging - [x] Crawler-Schedu…

Elven9 updated 2 years ago

1000+ results for crawler-management

1000+ results
for crawler-management