-
If I crawl a website with mostly static resources, I'm noticing there can be missing resources in the resulting WARC. The reason for that is either broken links or timeouts.
I have written tools to…
-
This change breaks our `archive_paths: "webhdfs://server/" because `os.path.join` just discards the prefix when the suffix is an absolute path.
https://github.com/webrecorder/pywb/blob/92e459bda52a…
-
Research and develop solutions for capturing, preserving, describing and providing access to archived websites. Investigate use of WebRecorder, ReplayWeb, integration of WARC view with AtoM.
-
Missing images both from the page itself as well as the reconstructive logo. WARC created with local webrecorder--built, run, and recorded using Docker and the webrecorder web interface: [temp-2018082…
-
See TODO in https://github.com/webrecorder/browsertrix-cloud/pull/1485/files, consolidate data table variations into either one `btrix-data-table` component, a CSS-class-based theme, or separate compo…
-
I am aware that browsertrix uses pywb in the background. I tried a website `https://www.kugou.com/` and noticed some notable missing elements with browsertrix.
I started pywb using
`docker run -e I…
-
I would prefer to have a settings menu where I can specify the location that the _warc_cache folder is stored, i.e. `~`, `/Volumes/Website\ Archives`, `G:\Data\Websites`.
Additionally, I would like…
-
I would like to know the position on this.
JWAT used the algorithm specified in the digest header directly.
So JWAT expects "SHA-256" since that seems to be the official name and the name supported …
-
As we discussed in #3, we need some kind of archiving solution that can be trusted and that is good enough for archiving modern, JS infested websites with potentially hidden content. We decided to use…
-
I'm trying to upload files captured with webrecorder to conifer, as webrecorder doesn't seem to have such a difficulty accessing facebook. I dowloaded the .wacz and unzipped the file and tried upload…