-
Excuse me, if i want to transfrom warc file to be a text,do you have any tools in the source or any apis can help me more easier to make the transformation?
-
The video is still on this live url:
https://sommansiger.nu/img/SomManSiger_full.mp4
Here are some of the fields from Solr. It is the last two that have been 'video' instead.
content_type_ser…
-
2024-02-14 21:01 INFO 2048692:root - Downloaded https://dl.fbaipublicfiles.com/laser/CCMatrix/v1.0.0/2020-10_0278.tsv.gz [200] took 8s (5766.4kB/s)
2024-02-14 21:01 INFO 2048692:root - Starting downl…
-
Now that we have a way to upload WARCs from the admin interface (#436), an index at the replay startup should not be mandatory anymore. New replay CLI should behave line this:
* `ipwb replay` shoul…
-
I noticed https://github.com/wabarc/cairn is on the list, but it doesn't support WARC/WACZ. Should that at least be noted in-line?
-
Hi,
in one of my mappings I have a boolean conjunction as condition: a subject should only be created if the field `warc-header.warc-type` has the string value `response` **and** if the string of t…
-
The field `url_norm` is essential for looking up URLs entered by humans, but it is disabled per default in `reference.conf` and enabling it is buried as a side-effect to enabling `warc.index.extract.l…
tokee updated
2 years ago
-
hello ,
When i run a new job , i got this error when the job is in progress :
dk.netarkivet.common.exceptions.IOFailure: Crawl probably interrupted by shutdown of HarvestController
i found this…
-
### What did you expect to happen? What happened instead?
When loading a WARC file in Chrome, records are processed at a reasonable speed. When loading the same file in Firefox it is wildly slow a…
-
**Is your feature request related to a problem? Please describe.**
Upon opening a WARC file of an archived email message in Archive Web.Page, I found the "Pages" tab to be empty, despite there being …