-
https://github.com/internetarchive/Zeno
-
When testing the endpoint running `python3 tests/test_timing.py`, sometimes nothing is returned. Try running the analyze topics and link extractor scripts in isolation, and check the outputs. I th…
-
WORKFLOW | Portal | Bot Message
-- | -- | --
BangaloreDutyFreeStore Workflow | PTC Wave 1 | Bot exception failedInput string was not in a correct format. at System.Number.Strin…
-
@mfahlandt
Trying to write from the release crawler this week, I found it confusing. When it runs, it creates a file based on the prior week's date. Any reason not to change the file name to the…
-
It's not necessary to crawl all ilias courses and not very convenient to always specify explicit courses to be crawled.
Allowing a regex (or even a glob) to the --crawler option will allow convenient…
-
Ideas:
Ask MP and CP if they're ok with us doing a scan of things on their platform. What does that look like? How would it work?
Check in with DPR to see if they have any data products now in their …
-
When resuming a crawl I noticed that passing `--url` seemed necessary, which seemed counter intuitve. Once I did this however I got a JSON parse error?
```
docker run -v $PWD/crawls:/crawls/ -p 9…
-
## Describe the bug
I'm using an arch distro which uses the newest stable release of python which is 3.12. With 3.12 distutils is now deprecated, so the program errors out for not finding distutils…
-
Good afternoon,
I've got a problem with YaCy user agent config where I'm setting up user agent like so:
```
crawler.userAgent.name=chimmieyacy
crawler.userAgent.string=chimmieyacy
```
in `…
-
Crawler load ques lost sync when crawl rate drops very low.
Loader looses track and keeps increasing. Crawler ppm 0-3 ppm.
A restart will clear it.
crawl_for_525982_start_points
Queue Size
[…