Nutch' protocol-okhttp supports HTTP/2 since its introduction in 2018. Alone, the WARC writer does not.
The following points need to be addressed:
[x] protocol-okhttp: record HTTP and SSL/TLS versions, see NUTCH-3062
[x] ensure that the HTTP headers in the WARC request and response records use "HTTP/1.1" in the request resp. status line. This is required for backward-compatibility with WARC readers. Cf. commoncrawl/news-crawl#42 and apache/incubator-stormcrawler#1010.
[x] save the true protocol version in the WARC-Protocol field, see iipc/warc-specifications#42
[x] also save the SSL/TLS version in a second WARC-Protocol field
[x] if required, normalize the protocol name(s)
[x] save the cipher suite in the WARC-Cipher-Suite field, see iipc/warc-specifications#86
[x] optionally add a counter for metrics which protocol versions are used
Nutch' protocol-okhttp supports HTTP/2 since its introduction in 2018. Alone, the WARC writer does not.
The following points need to be addressed: