-
Hello,
I would like to know if it possible to get both warc files compressed (not only the metadata one)
Thanks
nasry updated
6 years ago
-
It is currently available and causes no UI change.
-
NLW report that these websites should all have instances continually from 2004 onwards:
http://www.bloc.org.uk/
http://www.enlli.org/
http://www.morfablog.com/
http://www.cymruarywe.org/
http:…
-
Pool examples? Plugin module? Host-based rules, over time. Put together with canonicalisation rules topic.
-
I've got a Monitrix instance watching 4 separate, active crawlers. Rather than reading each concurrently it seems to round-robin, reading ~100,000 lines from one before moving to the next.
Is it poss…
-
Motivation:
* To allow the recording of messages using a different representation to their wire message format as
- the write protocol may be suboptimal for the purposes of storage and replay; or
…
-
- Choose http library:
- [Request](https://github.com/request/request)
- [beautiful soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [scrapy](https://github.com/…
-
As reported by Ed Pinsent of University of London:
Downloaded http://matkelly.com/wail_old/release/WAIL%200.2013.2.19.zip, unzipped, moved packages to C:\WAIL
When I run wail.exe, I get the foll…
-
Using the [CDXGenerator](https://github.com/internetarchive/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/CDXGenerator.java) I generated CDX files out of WARCs. Unfortunately, the …
-
Hi, I've been looking to run some crawls of my organisation's Sharepoint/intranet site but I'm having some issues getting through Microsoft 2FA Authentication.
Using --interactive successfully crea…