Closed jribbens closed 4 months ago
May be because of this case
The way I read it, there could be a file footer if the crc and length are not known ahead of time. This footer could be of length between 12 to 24 bytes.
So should we just bump the buffer to something like 100 bytes? Will check out on .docx file
So I created a simple docx file and noticed a very annoying discrepancy. It seems that the central directory shows an extraFieldLength
of 0 but the local file header is showing a positive length. This is a violation of the spec, as the central directory is supposed to be the source of truth. There are two ways to approach this:
The second option is a little bit annoying to implement since we are still using .then
and we can't simply return a new call to unzip. It might be worthwhile changing function to async/await to make it easier
@jribbens @jpambrun can you please checkout https://github.com/ZJONSSON/node-unzipper/blob/c80fd62cd8d387da247ea6b1802c34ae14ee838e/lib/Open/unzip.js#L37-L46 (part of https://github.com/ZJONSSON/node-unzipper/pull/312)
Here, I compare the size calculated from the local header to the size requested using the central directory and retry the unzip method if the local header points to a larger stream.
It's an interesting approach, but I am thinking in a file where all entries are like that it would stream the whole file twice and the performance will be cut in half?
Another idea is to have configurable extra padding and error and when it's not enough with an helpful message (e.g. Local header for entry XYZ is 76 bytes larger than specidied in the central directory, conside using a padding of >76 or more.)? And let people tweak this for their application use case?
Yet another idea is to continiue with the short stream, issue a new range request only for the missing bytes and switch the downloadding streams at the end. That feels a bit complex and error prone.
In this PR, if you look more closely, I have made the padding configurable but default to 1k. https://github.com/ZJONSSON/node-unzipper/blob/ca37f3b3c9a6ac4b5e33a4d70e4d9e341851a470/lib/Open/directory.js#L202-L208
So checking for localSize is just a fallback for when padding is not enough (and it also emits a streamRetry event that you can listen to).
@ZJONSSON can confirm this fixes the problem in my tests
While a minor point -- padding can't be set to zero in this case as it'll return to 1000.
Summary
unzipper 0.11.4 and earlier works fine for reading DOCX files (which are zip files). 0.11.5 however is completely broken and crashes with
Error: FILE_ENDED
.Sample code
Output under 0.11.4
Output under 0.11.5
Example input file
The example input file is downloadable here: https://unequivocal.eu/dl/example0.docx