The logging of the classes ExtractingResourceProducer and GZIPSeriesMember is very verbose and produces per transformed WARC record multiple log messages:
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries returnBytes
INFO: Returned (3165)bytes
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries read
INFO: read(8 bytes) bufferSize(3165)
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries getNextMember
INFO: getNextMember
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries read
INFO: read(3 bytes) bufferSize(3157)
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries getNextMember
INFO: AlignedResult:0
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries read
INFO: read(7 bytes) bufferSize(3154)
Oct 07, 2023 5:41:49 PM org.archive.format.gzip.GZIPMemberSeries getNextMember
INFO: Read next GZip header...
Oct 07, 2023 5:41:49 PM org.archive.extract.ExtractingResourceProducer getNext
INFO: Extracting (class org.archive.resource.warc.WARCResource) with (class org.archive.resource.http.HTTPResponseResourceFactory)
These messages generate 40+ MB of log output per WARC file (about 1 GiB in size). To avoid that log files are flooded, this PR changes the log level for these outputs from INFO to FINE. The level for messages which might indicate potential reasons for errors are left as is.
The logging of the classes ExtractingResourceProducer and GZIPSeriesMember is very verbose and produces per transformed WARC record multiple log messages:
These messages generate 40+ MB of log output per WARC file (about 1 GiB in size). To avoid that log files are flooded, this PR changes the log level for these outputs from INFO to FINE. The level for messages which might indicate potential reasons for errors are left as is.