Closed T-3B closed 1 year ago
Hi, the reason there is a size restriction is that there has never been rigorous testing done with very large files. In the deflate code used for PNG/ZIP/GZIP, there are many places that use bit counters, which can easily become larger than UINT_MAX even on smaller files. With the present size limit I have no encountered such issues so far. Removing it would require quite a bit of reviewing and testing to avoid files not being handled properly, so I'll keep this as-is for now.
I support @fhanau ’s decision here because other tools run into such troubles, too. E.g. DeflOpt silently breaks files of 512 MB or more.
I’d approve the change if we had a set of test files and successfully ran the modified build with address sanitizer / undefined behavior sanitizer on them. Unfortunately, I don’t know how such a test file set would look.
Sorry for my ignorance.
On the other hand, I think it's possible to support larger input files (but I don't know how long it would take to make this change). There is a "great" zopfli mod (supporting GZIP, ZIP, PNG), stating
Maximum supported input file size: 17592186044415MB.
Which is 16 EiB. "Great" because produces almost every time a larger output than ECT.
Do not hesitate to close this PR whenever you want, since changing a single condition won't solve the issue.
Yeah, for ECT to properly support huge files it would be best if it had been designed to support them from the start. I doubt that krzych-zopfli actually supports files of this size – it is impossible to test and the cost model, which estimates the bit cost at a certain position using double precision, becomes too inaccurate to do optimal parsing effectively at around 2^50
bytes – but it does looks better equipped to deal with very large files, so that's certainly an option.
I found this quite restrictive, since it is just an
if
condition: bigger does work well with ECT, I just tried with a 2GB gzip file.<= UINT_MAX
to have the same behavior as line 427 and line 470.