-
On my arch linux machine, after running
```sh
git clone https://github.com/machawk1/wail.git
cd wail
docker build -t machawk1/wail .
```
Docker outputs _a bunch_, but then asks for `Country of…
-
I'm trying to run the latest Heritrix build (build heritrix-3.3.0-20180529.100446-105-dist.tar.gz which I downloaded [here](https://builds.archive.org/job/Heritrix-3/lastStableBuild/org.archive.heritr…
-
For reporting, we need a script that lists the DC buckets on AWS, and for every file, reports at least the full path and the file size of each. This should output lines of JSON, and an ID like `s3a://…
-
Tracking update job was changed [from nightly to weekly](https://github.com/GSA/data.gov/issues/4345) due to its long processing time. It is speculated that bot crawling traffic is the cause for the l…
-
Hi!
I'm using the API in order to instantiate many little jobs in Heritrix because I want to crawl single websites. The reason is that I want to remove the complexity of Heritrix when many toeThrea…
-
I wanted to create a tmx file from a set of French and English pdf files (pdf-extract is installed and working). The files are contained in two directories (one per language as it happens).
A tmx …
-
Now that #625 is pretty much sorted, we need a way for users to upload WACZ files (and eventually other formats) via the front-end!
## Feature Requirements
- Users should be able to select and upload…
-
Heritrix has a [Disk space monitor](https://heritrix.readthedocs.io/en/latest/bean-reference.html#diskspacemonitor) that "Monitors the available space on the paths configured. If the available space d…
-
I am getting an error upon running the command
`$HERITRIX_HOME/dist/src/main/bin/heritrix -a admin:admin`
It seems that lib is a non-existent directory:
```$HERITRIX_HOME/dist/src/main/bin/heritr…
-
蛮多这种自管理评论系统都有拒绝搜索引擎索引的功能,毕竟管理界面暴露得越少越好,希望添加此功能