-
Hello! Replayweb.page app (v1.4.0—1.5.2 on Widndows 10 20H2) stops at 30-40% loading of any .warc file > 1GB from this save of gamerankings.com — https://archive.fart.website/archivebot/viewer/job/9ux…
-
Combine the text and page index. It will be a simpler structure
-
Below is my command...
workers is 6 and it keeps coming out as 1.
My cpu has 6 cores and 12 threads. The ram is 64gb, no problem.....
`docker run -v $PWD/crawls:/crawls/ -it webrecorder/browsertr…
-
I am developing a small tool that detects and classifies object in images extracted from WARC archives. I use some functionality from warc-extractor.py in my Python code. I order to do that, I have to…
-
- [x] Add sign up page
- [x] Add verification page
- [x] Show verification status in UI
-
Screenshot:
![image](https://user-images.githubusercontent.com/2303841/136691878-e1a1e781-d2b4-4cee-ac6c-bf997f84589d.png)
Seems that it thinks it needs to crawl another page, but there aren't a…
-
Convert existing draft spec text to ReSpec!
-
- setup deploy user
- install docker
- pull docker image for browsertrix: `docker pull webrecorder/browsertrix-crawler`
- create [browsertrix yaml crawl config file](https://github.com/webrecorder/…
-
As briefly discussed in https://github.com/openzim/zimit/issues/135, WARC assets **and** WARC Headers, are all stored under the Content area (`C/`) in Type 1 ("no-namespace") ZIMs. In legacy Type 0 ZI…
-
Allow multiple HDFS services, each using different IDs in TrackDB. e.g.
"id":"hdfs://hdfs:54310/1_data/npld/webrecorder/bl-your_stories/warcs/www.bl.uk-20150814093821.warc.gz",
We add a …