-
It looks from unit tests that jwarc should read arc files. When I try to read ARC test files from warcio, I'm getting an exception.
Is this user error in how I'm calling jwarc or are ARC files not…
-
i have around 2-3 jobs running everyday, when the jobs run the heap usage increases, once the jobs finish the heap usage retains the increased value. i am using a fork of heritrix, so i wanted to unde…
-
Voici le script avec ajouts de bots en date du 10/01/2024 :
-
The first capture of http://leeds2023.co.uk/ redirects to a completely different URL: http://www.slunglow.org/. all other captures for leeds2023 seem fine, except for this first capture.
- the http…
-
Thanks for the great list.
It looks like the same bot is listed multiple times.
| name | times |
|----------------------|-------|
| MJ12bot | 3 |
| Barkrowler …
-
On my arch linux machine, after running
```sh
git clone https://github.com/machawk1/wail.git
cd wail
docker build -t machawk1/wail .
```
Docker outputs _a bunch_, but then asks for `Country of…
-
I'm trying to run the latest Heritrix build (build heritrix-3.3.0-20180529.100446-105-dist.tar.gz which I downloaded [here](https://builds.archive.org/job/Heritrix-3/lastStableBuild/org.archive.heritr…
-
For reporting, we need a script that lists the DC buckets on AWS, and for every file, reports at least the full path and the file size of each. This should output lines of JSON, and an ID like `s3a://…
-
Tracking update job was changed [from nightly to weekly](https://github.com/GSA/data.gov/issues/4345) due to its long processing time. It is speculated that bot crawling traffic is the cause for the l…
-
### Actual Behavior
I'm trying to build a recipe (https://github.com/bitextor/bitextor/tree/b721f374cb93fca1007644a964897007f6978862/conda-build), but when I run `conda build` and specify the n…