Open Arkiver2 opened 9 years ago
Rough design idea:
--disable-warc
argument to disable exportWhich warc library should I use? IA's warc seems to be incompatible with Python 3
If you want to record as WARC files easily, you'll need an agent that supports recording HTTP traffic accurately to WARC files. Some example agents include Heritrix, Wget, and Wpull but these are web crawlers.
If you can get raw HTTP request and responses from Python Requests, then you try to build a WARC file yourself. I wrote a WARC library called Warcat which is supported under Python 3. I also wrote Wpull which runs under Python 3 and maybe you can take code from it.
It would be very useful if warc.gz files are also made for the url shorteners we are archiving. The chance of people looking in the wayback machine for an url (shortener) is probably bigger then the chance of looking through the .xz files for the shortener they are looking for.