-
We have some "new" (some are few months old ...) CLI argument of browsertrix crawler to consider:
```
--seedFile, --urlFile If set, read a list of seed urls, on
…
-
We can observe a very high crawling duration variability on [dp.la_en_all recipe](https://farm.openzim.org/recipes/dp.la_en_all). All tasks are using the same image ("ghcr.io/openzim/zimit:1.5.0") and…
-
## Bug Report
**Current Behavior**
I'm not sure if this is a bug in crawler or indexed_search. Maybe it's also a missing configuration on my side, due to the very outdated and/or uncomplete docume…
-
Hello,
I'm testing the default training configuration "combo_go2ARX5_pickle_reaching_extreme" and ran into some issues that I could use help with.
**Expected Training Outcome:** Without modifyin…
-
I got to thinking that there really ought to be something you can do with Vengance Spirits, and this is what I came up with last night:.
1) Start with some way to simply bottle multiple VSs, and "thr…
-
While extracting multiple links, I encountered a situation where some of them returned a "Too Many Requests" message, but the status code was still 200.
- To address this issue, how can I prevent …
-
Sometimes it might be useful to cancel the tasks queue. It happened when I send the wrong meilisearch credentials but the processes were already launched.
This is the way to do it:
```
this.q…
-
Hello, I'm experiencing performance issues with my web crawler after approximately 1.5 to 2 hours of runtime. The crawling speed significantly decreases to about one site per minute or less, and I'm e…
-
We are experiencing extremely slow task submission via the DaskExecutor for very large mapped tasks. With previous flow tests where a task was mapped over roughly 20K items, task submission was suffi…
-
Data validation on the client-side to prevent inconsistent or invalid data being stored