crawling-tasks Search Results

unclecode/crawl4ai #227

Smart/Agentic Crawler (Invite Collaboration)

I'm planning to add a smart crawler that takes a set of user-defined objectives and continues crawling to satisfy them. Objectives can be a query requiring a sufficient amount of information to answer…

unclecode updated 2 weeks ago

internetarchive/dweb-mirror #320

Crawling fails - leaving tasks

Crawling fails on two of the the edge cases in https://github.com/internetarchive/dweb-archive/issues/120. In both cases, presence in the crawl causes left over tasks - [ ] bdc-W3PD1123 - [ ] Journ…

mitra42 updated 1 year ago

sesac-analyst/BOK_TEAM_1 #18

github directory management

- [x] create directories according to tasks - [x] crawling - [ ] cleansing - [ ] preprocess - [ ] modeling - [ ] inference

dkswhale updated 3 months ago

unclecode/crawl4ai #237

Prevent Crawl4AI from Crawling After Link Failure – Only Ext…

I noticed an issue with Crawl4AI where it initially extracts content from the given links as expected. However, once a link fails, the tool starts crawling the website, which I don’t want. The crawlin…

Pranshu172 updated 1 week ago

AOEpeople/Aoe_Scheduler #41

Constant 'Too late for the schedule' on Turpentine crawling …

Hi, We have Varnish running with support by Nexcess Turpentine module. However cannot make its crawler running - cron throws errors saying:' Cron error while executing turpentine_crawl_urls: excepti…

oharlem updated 9 years ago

hellock/icrawler #125

GoogleImageCrawler not working!

from icrawler.builtin import BingImageCrawler, GoogleImageCrawler google_crawler = GoogleImageCrawler(storage={'root_dir': './downloads'}) google_crawler.crawl(keyword='gui based tool', max_num=50…

Shashwat79802 updated 1 month ago

catalyst/moodle-tool_crawler #25

Specify when new crawl sessions can begin, ie crawl every mi…

I realised that once I changed the link crawler robot (\tool_crawler\task\crawl_task) cron to run every Sat instead ASAP under Server->Scheduled Tasks, the currently crawling process will halt. I a…

allison-soo updated 7 years ago

ctrl-space-labs/gendox-core #100

As a Backend developer I want to enable Java 21 Virtual Thr…

## New Major Version This is a breaking change. Since this is still in beta, only the minor version will be updated though. TBD the exact versioning ### Description Threads Pools is a major Java n…

sekasx updated 3 weeks ago

zevv/duc #161

Indexing algorithm

Hello, this is more a suggestion than an issue. The duc indexing is already quite fast but you might be interested in the filesystem crawling algorithm of [robinhood](https://github.com/cea-hpc/…

jbd updated 3 years ago

openzim/zimit #433

Consider "new" crawler CLI arguments

We have some "new" (some are few months old ...) CLI argument of browsertrix crawler to consider: ``` --seedFile, --urlFile If set, read a list of seed urls, on …

benoit74 updated 1 week ago

857 results for crawling-tasks

857 results
for crawling-tasks