ArchiveTeam / wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
https://www.archiveteam.org/
GNU General Public License v3.0
99 stars 14 forks source link

WARC-Payload-Digest to be calculated over entity-body #13

Closed Arkiver2 closed 1 year ago

Arkiver2 commented 2 years ago

According to the WARCs specifications

The payload of an application/http block is its ‘entity-body’ (per [RFC2616]).

this is not currently being done when Transfer-Encoding is present.

Arkiver2 commented 1 year ago

This is fixed with https://github.com/ArchiveTeam/wget-lua/commit/4fda5f2cc5188a1db15ff5b0d535d56aafc5a4de.