-
Hi I used your services and done all steps but when I run extract with regexp url which I wrote in config file not match urls. In logs I got the error, but when I manually match it in python everythi…
-
I am building a WARC file standards conformance tester, and I am doing a survey of tools which have generated WARCs that are present in archives.
I'd like to assemble a set of wpull WARCs over the …
-
#### Version information:
go-ipfs version: 0.4.4
#### Type: Feature, Enhancement
#### Priority: P4
#### Area: Tools, Importer
#### Description:
Like in case of WARCs, gzip files do suppo…
-
Reading the definition of `original-content`, `original-links` and `echo-original-headers`, this seems rather more complicated than it needs to be.
If the archive serves `original-content`, then by…
-
File Attributes (either in POSIX or extended attributes) of UNIX File System Objects SHOULD be preserved as long as they allow canonical serialisation (i.e. can be uniquely hashed regardless of env).
…
-
When an archived HTML page is displayed, all links its inlined resources needs to be resolved to WARC-files and offsets. For pages with hundreds of such resources, this is a costly process. Currently …
-
From https://farm.zimit.kiwix.org/pipeline/899453d8-6002-46a5-8c36-cc2f1c4783ef/debug
```
Traceback (most recent call last):
File "/usr/bin/zimit", line 8, in
sys.exit(zimit.zimit())
…
-
ArchiveBox should be able to load WARCs from outside sources, replay them with `pywb`, and re-archive them using all the redundant archive methods like Chrome Headless, Wget, etc.
This would be mos…
-
Using the [CDXGenerator](https://github.com/internetarchive/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/CDXGenerator.java) I generated CDX files out of WARCs. Unfortunately, the …
-
It looks like Shaarli feeds are not being parsed correctly and markup is being included in the link structure (much like ticket 134 for pocket). Also, it looks like shaarli detail and tag pages are be…