-
i only want files, not warc.
can grab-site output regular files (like html and images) for me like wget can? (links must be converted to relative links)
side question: has anyone here actually …
ftc2 updated
2 months ago
-
Hi!
When I found out about this project, its name made me think it was a tool to read [WARC files](https://en.wikipedia.org/wiki/Web_ARChive), which stands for... Web ARChives!
Is there support …
-
First of all, thank you for this great project!
Is there any reason why ARC support was implemented first? Is WARC support planned for the near future?
-
I downloaded a website from Internet Archive using [wayback-machine-downloader](https://github.com/hartator/wayback-machine-downloader) then created a WARC using warcit with the following command: `wa…
-
Hello,
I would like to know if it possible to get both warc files compressed (not only the metadata one)
Thanks
nasry updated
6 years ago
-
I am using the local executor. My machine has 48 Cpus with 348 Ram. Any idea how to speed this up? Currently one single task (task=1, running for 1 warc.gz file, with size ~1g) takes half an hour. Thi…
-
When converting content in an archive it is useful for diagnostic purposes to record the versions of major software components used and important conversion options. Another common use case is to iden…
-
This is the same issue as https://github.com/webrecorder/wombat/issues/82, which has been partially solved in https://github.com/webrecorder/wabac.js/pull/128 (and few other commits).
Note that htt…
-
See https://github.com/webrecorder/browsertrix-crawler/issues/630
-
They are being correctly parsed from the config file, and they are being correctly instantiated.
But they are not working, these pesky http 4xx & 5xx errors are still being wrote on the `WARC` file…