cjcodeproj / medialibrary

Python code to read XML media files
MIT License
2 stars 0 forks source link

Investigate MHTML, Moz Archive Format, and WARC #207

Open cjcodeproj opened 1 month ago

cjcodeproj commented 1 month ago

Investigate a system to represent one or more webpages made available in a single file that can be read by any web browser.

The goal is to generate a report on media content in an offline format readable by web browsers.

Resources:

https://en.wikipedia.org/wiki/WARC_(file_format) https://wiki.archiveteam.org/index.php/Wget_with_WARC_output https://www.amadzone.org/mozilla-archive-format/maff-specification.html https://en.wikipedia.org/wiki/Mozilla_Archive_Format