-
Right now the `FETCH_WARC` option only creates a simple html file WARC with wget, it doesn't save all the requests made dynamically after JS executes by chrome headless.
We should set up https://gi…
-
I'm not too familiar with how Pywb works, but I want it so that if a WARC were to be deleted (either unintentionally or on accident), Pywb recognizes that its not no longer there and doesn't try to sh…
-
I am aware that browsertrix uses pywb in the background. I tried a website `https://www.kugou.com/` and noticed some notable missing elements with browsertrix.
I started pywb using
`docker run -e I…
-
I encountered an issue when archiving [this page](http://matkelly.com/brew/), which has many images. If I open the page, click the WARCreate icon, then immediately click the Generate WARC button, many…
-
## Description of the issue
`pack-warcs.sh` has a dependency to GNU `findutils` package and thus cannot be run on systems or Docker containers without the `findutils` package due to the REGEX not bei…
-
Connecting @machawk1 & @oduwsdl: https://github.com/oduwsdl/ipwb/issues/211
We should define a task that:
1. Start with a user-generated collection of URLs. Allow users to fire off a "task" that w…
-
Running ArchiveSpark from docker.
Enrich function is not adding any payload when peekJson is called.
The payload in my warc files are in binary.
Can it be the problem?
If it is, then is there a…
-
youzim.it run of https://archives.nyphil.org/ failed reporting lots of unrecognized chars.
Task is [here](https://farm.youzim.it/pipeline/3cd41b6b-2d81-4acb-8948-a6820c5fa07f).
Command used:
``…
-
For archiving crawlers
-
I'm getting 'permanently moved to here' results when loading up a WARC I made. The word 'here' is a link, and when I click it, it just reloads the same 'permanently moved' page. This would make sense …