-
An example might be to ask the user if they want the WARCs moved/copied to the archives folder, replay immediately in a certain engine, recrawl the URIs in the WARCs, etc.
-
If the API times out or the script breaks in the middle of creating an AIP, it currently has to be deleted before the script runs again in order for it to be correctly finished. For AIPs with a lot of…
-
Via @ikreymer, Web Archive Collection Zipped (WACZ) Format, https://github.com/webrecorder/wacz-format (MIT, potentially reusable)
Example of MDN WACZ at https://twitter.com/webrecorder_io/status/1…
-
It was requested that we eventually switch to writing WARC files instead of arc, in order to record additional metadata.
-
Hi, I have recorded some twitter pages with conifer desktop app it replay smoothly on conifer. But when I try to replay .Warc files it on [Openwayback](https://github.com/iipc/openwayback) it gives e…
-
i want to extract files from warcs.
when i use `jwattools extract`, i get a bunch of filenames like `extracted.001` or something like that. am i doing something wrong?
ftc2 updated
11 months ago
-
[example file](https://kiska.b-cdn.net/omESX/example.warc.gz)
```python
f = warc.open("example.warc.gz")
for record in f:
print record['WARC-Target-URI'], record['Content-Length']
```
### E…
-
I generated some WARC files with [node-warc](https://www.npmjs.com/package/node-warc) and they are mostly fine except for the bookmarks/sidebar list of pages which isn't showing the date.
Interesti…
gjvnq updated
2 years ago
-
When an archived HTML page is displayed, all links its inlined resources needs to be resolved to WARC-files and offsets. For pages with hundreds of such resources, this is a costly process. Currently …
-
```
#first run
1.7M warc_cache/warcs/book.pythontips.com.warc.gz
#Second run, exact same code
516K warc_cache/warcs/book.pythontips.com.warc.gz
#Deleted dedupe but not warc file
1.7M warc_cach…