The CC-NEWS contain the literal values of the HTTP header fields Content-Encoding, Transfer-Encoding and Content-Length although the payload is stored unchunked and uncompressed.
the header fields Content-Encoding and Transfer-Encoding should be masked by a prefix (the CC-MAIN WARC files use X-Crawler-)
if the value of Content-Length is wrong because of a change of the Content-Encoding, the original HTTP header should be masked and the correct value should be given in the header Content-Length
The CC-NEWS contain the literal values of the HTTP header fields
Content-Encoding
,Transfer-Encoding
andContent-Length
although the payload is stored unchunked and uncompressed.Content-Encoding
andTransfer-Encoding
should be masked by a prefix (the CC-MAIN WARC files useX-Crawler-
)Content-Length
is wrong because of a change of the Content-Encoding, the original HTTP header should be masked and the correct value should be given in the headerContent-Length
Thanks, @wumpus for detecting this!