bower / decompress-zip

Module that decompresses zip files
MIT License
102 stars 76 forks source link

Streaming interface? #28

Closed domharrington closed 10 years ago

domharrington commented 10 years ago

Is there any plans in the pipeline to add a streaming interface to this module?

sheerun commented 10 years ago

https://www.npmjs.org/package/gulp-unzip ?

domharrington commented 10 years ago

Thanks for the reply. Unfortunately that uses node-unzip under the hood which doesn't appear to be very tolerant of corrupted zip files. https://github.com/EvanOxfeld/node-unzip/issues/58

Do you happen to know if the time it takes to unzipper.list() is quicker than unzipper.extract()? Or are they essentially the same operation under the hood? I'm going to be unzipping large files of ~200mb (unzipped to about 1.2gb) so dont want to unzip all into memory at once.

domharrington commented 10 years ago

I've just done a simple test on my machine comparing the time of .list() and .extract() on a 9mb file which extracts to 1.2gb.

List - 0.07s

var DecompressZip = require('decompress-zip');
var unzipper = new DecompressZip(__dirname + '/21724-twhvxo.zip')

unzipper.on('error', function (err) {
  console.log('Caught an error', err);
});

unzipper.on('list', function (files) {
  console.log('The archive contains:');
  console.log(files);
});

unzipper.list();

Extract - 2.91s

var DecompressZip = require('decompress-zip');
var unzipper = new DecompressZip(__dirname + '/21724-twhvxo.zip')

unzipper.on('error', function (err) {
  console.log('Caught an error', err);
});

unzipper.extract({
  path: './extracted'
});

Unlike node-unzip, this module correctly reports for incorrupt zip files. So i'm going to use this to check for the integrity of the zip file and it's contents, then node-unzip to streaming unzip it. Thanks for your help.

wibblymat commented 10 years ago

@domharrington ZIP files are not streamable reliably. This is the main reason why node-unzip is so flaky. To correctly read a ZIP you need to start at the end and seek into the file at various places.