codedread / bitjs

Binary Tools for JavaScript
MIT License
81 stars 7 forks source link

Support data descriptors in zip files #19

Closed codedread closed 4 years ago

codedread commented 4 years ago

In unzip.js we have the following TODO:

// TODO: deal with data descriptor if present (we currently assume no data descriptor!)
// "This descriptor exists only if bit 3 of the general purpose bit flag is set"
// But how do you figure out how big the file data is if you don't know the compressedSize
// from the header?!?

What I think this means is that we may have to rely on the Central Directory structures to know when each local file starts and ends. That should tell us where the data descriptor fields are so we can properly get the compressed size of each file.

This will require a bit of refactoring, unfortunately. One possibility is upon first encounter of a data descriptor, stop unzipping local files, scan through the rest of the file, get central directory information and then pick up scanning again.

(This also means that if a zip file uses data descriptors, it's not really set up for streaming, since the entire zip file has to be read into the browser before we can get to the Central Directory structures).

wanglingsong commented 4 years ago

That's what exactly https://github.com/Stuk/jszip do. They load whole file into memory and read Central Directory in the end. You can have look at their source.

codedread commented 4 years ago

@wanglingsong I decided to go with the slightly easier approach of seeking ahead to the next file signature (or anything marking the end of all the files in the zip). This means that bitjs can still unzip files in chunks (enabling streaming).