-
I am trying to download one of the builds as given by the readme. The links I could check are
- [http://builds.archive.org/maven2/org/archive/heritrix/heritrix/3.4.0-SNAPSHOT/heritrix-3.4.0-SNAPSHO…
-
New releases beyond the 3.4.0 version that was minted by the National Library of Iceland are being pushed to https://github.com/internetarchive/heritrix3/releases. It would be good to get a newer vers…
-
We should run heritrix against a set of urls, copy the output, and write a set of tests that check sentry's output against the heretrix output as a form of ground truth
-
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 76 source files to /usr/local/tomcat/heritrix/commons/target/classes
[INFO] ----------------------------------------------------…
-
Add an Airflow DAG, based on the `rclone/rclone` Docker image, running e.g.
rclone copy --hdfs-namenode h3nn.wa.bl.uk:54310 --hdfs-username ingest --max-age 24h --no-traverse /mnt/gluster/fc/h…
-
Every web harvester container must have a heritrix container. This is currently done by simple linking. However, this probably won't work well with `docker-compose scale`, as the 1:1 pairing won't o…
-
Using a clean Debian 10.4 install, latest WCT (2.0.2) and latest Heritrix (3.4.0-SNAPSHOT-2020-06-01T04:58:15Z), starting a target instance fails with an error:
```
An unexpected error occured
…
-
This question is about [IPv6 address representation](https://en.wikipedia.org/wiki/IPv6_address#Representation) in WARC captures.
- refers to RFC4291, and
- says that the form (`x:x:x:x:x:x:x:x`) …
-
I wrote a Docker file for the current version(s). Maybe you want to look into it and integrate it here.
It works for me but I only have some simple use-cases (like API tests with python3), so I do …
-
```
新的抓取程序又发现两个新的异常,以下两个URI在处理时会��
�异常,需要添加新的单元测试,可能要修改源码。
2012-06-01T16:38:55.938Z -5 39186
http://item.taobao.com/item.htm?id=14788502928&spm=null LLR
http://ershou.taobao.com/item.htm?id…