-
Hi,
I have some warc files created using [warcit](https://github.com/webrecorder/warcit). Somehow after indexing (without errors or warnings), I can't find any page included in it on SolrWayback. I…
-
```
$ pipx run --verbose fastwarc check /tmp/warcs/WARCPROX-20220315191329244-00000-icvgw961.warc
pipx >(setup:729): pipx version is 1.0.0
pipx >(setup:730): Default python interpreter is '/home/us…
-
This screen will produce a JSON that is then passed to the crawl config creation API endpoint.
The format includes a top-level dictionary with a Browsertrix Cloud-specific options, and a `config` d…
-
## Describe the bug
Even after 2.6.2 #682, static resources are not being loaded when PyWB is deployed under a prefix. We deploy PyWB under e.g. `/wayback/` so the static resources are at `/waybac…
-
Hello! Replayweb.page app (v1.4.0—1.5.2 on Widndows 10 20H2) stops at 30-40% loading of any .warc file > 1GB from this save of gamerankings.com — https://archive.fart.website/archivebot/viewer/job/9ux…
-
Since we now have many TBs of WACZ 1.1.1 files, we should have a ReSpec document that reflects the current version,
and may a separate draft that reflects the proposed changes for 1.2.
We should hav…
-
Use the `--driver` flag or customize `run.sh` to point the crawl to a custom JS file (inspired by the default: https://github.com/webrecorder/browsertrix-crawler/blob/main/defaultDriver.js). We need a…
-
Combine the text and page index. It will be a simpler structure
-
Below is my command...
workers is 6 and it keeps coming out as 1.
My cpu has 6 cores and 12 threads. The ram is 64gb, no problem.....
`docker run -v $PWD/crawls:/crawls/ -it webrecorder/browsertr…
-
I am developing a small tool that detects and classifies object in images extracted from WARC archives. I use some functionality from warc-extractor.py in my Python code. I order to do that, I have to…