chatnoir-eu / chatnoir-resiliparse

A robust web archive analytics toolkit
https://resiliparse.chatnoir.eu
Apache License 2.0
55 stars 9 forks source link

Fix ArchiveIterator min_content_length behavior #21

Closed jmfrees closed 2 years ago

jmfrees commented 2 years ago

The current implementation does not respect min_content_length as it should according to the documentation.

jmfrees commented 2 years ago

Please let me know if I need to open an associated issue.

codecov[bot] commented 2 years ago

Codecov Report

Merging #21 (4efd6df) into develop (aa25fcf) will not change coverage. The diff coverage is 100.00%.

:exclamation: Current head 4efd6df differs from pull request most recent head 03e164e. Consider uploading reports for the commit 03e164e to get more accurate results

@@           Coverage Diff            @@
##           develop      #21   +/-   ##
========================================
  Coverage    90.86%   90.86%           
========================================
  Files           21       21           
  Lines         2956     2956           
========================================
  Hits          2686     2686           
  Misses         270      270           
Impacted Files Coverage Δ
fastwarc/fastwarc/warc.pyx 94.68% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update aa25fcf...03e164e. Read the comment docs.

jmfrees commented 2 years ago

Making those changes now. My formatting options were set to automatic, so the whitespace changes were a side effect.

jmfrees commented 2 years ago

I have set mine to leave alone lines that were not changed. Prevents a lot of extra diffs with existing code.

That's an excellent idea. I may have to update my configs for that. Thanks!