-
That is a great pleasure working with warc, however I'm experiencing error when mapping larger mount of files. It seems like the connections to the files are not closed. Please find below the reproduc…
-
Starting from scratch, it is very hard to guess what the requirements are for a SolrWayback setup. There should be a guide with common setups that outline hardware, overall setup and challenges, i.e.
…
-
I think this is something people know, but it is not explicitly stated: Can a record have multiple extension-fields of the same type?
Section 5.1 of the 1.1 spec says "WARC named fields of the same…
-
Possibly hinting at other escaping issues.
Example:
```
WARC/1.0
WARC-Date: 2004-11-10T16:15:13Z
WARC-Source-URI: file://waste/images/17#.jpg
WARC-Created-Date: 2018-02-06T16:26:13Z
WARC-Ty…
-
Hello. Is it possible to load and replay multiple warcs that are dependent on each other on this software? The warcs I've acquired are in this format and my only ability to replay them is limited to i…
-
There is support for storing and writing custom payloads, but all payloads are read as `RawPayload`. Create a `FileReader` to be able to modify how a WARC file is read.
```go
file := os.Open("/pat…
-
- Avoid parsing entirety of warc file
- Don't parse http records inside
Any improvements we can make to mean that large and gargantuan warc files can be read and processed speedily
-
This field was [previously discussed](https://github.com/internetarchive/warcprox/issues/13#issuecomment-570417173) by @ato @nlevitt and @JustAnotherArchivist on an issue in a different repository. T…
-
When reading WARC files compressed with gzip, many of the entries contained are skipped or misread. To reproduce, use common crawl data in .gz format, count the number of entries found by the WARC lib…
-
Definition: it is necessary to have a common way of rendering AJAX interactions in WARC.
Decision: Propose a way to record rendering files either in the standard or as an appendix (NB from Clément: …