-
If the desired collection name passed to browsertrix via `--collection` contains `.`, `/`, `:`, or potentially other special characters, `pywb` silently fails to create the necessary directory structu…
-
### Description
At some point Youtube has updated the site and now all (?) captions generated by Synapse for the site are:
Before you continue to YouTube
Sign in a Google company Before you con…
-
Vikingpress got blocked in our school so please please help!
-
Hello!
Thanks for your great work on this project. We have recently switched from using Webrecorder desktop and I'm enjoying exploring archiveweb.page.
I hope this question isn't too specific, …
-
When scraping a website encoded in Windows Cyrillic (windows-1251), the convertion to UTF-8 is faulty, resulting in tons of `пїЅпїЅпїЅпїЅпїЅ` strings.
- Sample website: https://sattvinfo.net/
- Sa…
-
If one is recording a site of video content (especially video content which repeats upon, say, a reload or clicking on the link again), the files become huge. Having the ability to intelligently dedup…
-
I have successfully configured openwayback but I am confused where should I put the .warc file and how can I access it then?
-
Installed Heritrix 3.3.0 on a Linux server. (3.4.0 fails consistently when editing a configuration.) Out-of-the-box configuration, just set the seed and the operatorContactUrl.
I tell it to crawl…
wroth updated
3 years ago
-
Hi,
I'm trying to collect old news and the problem is that the API response to my query results in low outputs. I show bellow an example of a news article published by Público in March 2012, availab…
-
Investigating https://github.com/openzim/zimit/issues/71 I realized I can't seem to be able to scrape videos reliably with the current version.
Even a very simple tests doesn't work:
- https://w…