-
We should run heritrix against a set of urls, copy the output, and write a set of tests that check sentry's output against the heretrix output as a form of ground truth
-
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 76 source files to /usr/local/tomcat/heritrix/commons/target/classes
[INFO] ----------------------------------------------------…
-
Add an Airflow DAG, based on the `rclone/rclone` Docker image, running e.g.
rclone copy --hdfs-namenode h3nn.wa.bl.uk:54310 --hdfs-username ingest --max-age 24h --no-traverse /mnt/gluster/fc/h…
-
New releases beyond the 3.4.0 version that was minted by the National Library of Iceland are being pushed to https://github.com/internetarchive/heritrix3/releases. It would be good to get a newer vers…
-
I wrote a Docker file for the current version(s). Maybe you want to look into it and integrate it here.
It works for me but I only have some simple use-cases (like API tests with python3), so I do …
-
Every web harvester container must have a heritrix container. This is currently done by simple linking. However, this probably won't work well with `docker-compose scale`, as the 1:1 pairing won't o…
-
Using a clean Debian 10.4 install, latest WCT (2.0.2) and latest Heritrix (3.4.0-SNAPSHOT-2020-06-01T04:58:15Z), starting a target instance fails with an error:
```
An unexpected error occured
…
-
```
新的抓取程序又发现两个新的异常,以下两个URI在处理时会��
�异常,需要添加新的单元测试,可能要修改源码。
2012-06-01T16:38:55.938Z -5 39186
http://item.taobao.com/item.htm?id=14788502928&spm=null LLR
http://ershou.taobao.com/item.htm?id…
-
WAIL - Heritrix User Interface (UI) Basic Requirements
1. The UI must list all (potentially many) Heritrix jobs currently residing in the jobs directory.
2. On selecting a job in the UI listing (1), …
-
hello ,
When i run a new job , i got this error when the job is in progress :
dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController
i found this…