DanielHindi / aws-s3-zipper

takes an amazon s3 bucket folder and zips it for streaming or serializes to a file
119 stars 74 forks source link

zips in s3 bucket are corrupted #3

Closed d3m3tr1s closed 7 years ago

d3m3tr1s commented 8 years ago

Hello Daniel, I'm using your tool to zip images stored inside s3 bucket using your code as part of AWS Lambda, and facing the problem that for the larger images sets resulting zip is being corrupted I set Lambda memory limit to the max possible by Amazon 1536 MB, and still having this issue even though cloud watch log shows that lambda call used 300-500 MB and it reports successful completion, zip is being created, though corrupted. When I repeat same with no more than 4-5 images with size 4-5 MB each, it creates healthy zip.

Any suggestion is highly appreciated.

Thank you for you great tool!

DanielHindi commented 8 years ago

So I use this module on a daily basis and zips up hundreds of megabytes of images yet in smaller zip increments.

I'm using npm/archiver to zip. It seems there are updated on [archiver] that may solve this. It needs to be tested.

That being said, you think 100-200 images at 2-5mb each would be a good test?

d3m3tr1s commented 8 years ago

Thank you for the tip, will try to update archiver module.

Do you use your module wrapped into AWS Lambda function?

As side note I had to change this line

https://github.com/DanielHindi/aws-s3-zipper/blob/master/index.js#L187

to this

var tempFile = '/tmp/__' + Date.now() + '.zip';

to make it work in AWS Lambda, as tmp directory is the only writable place for lambdas

DanielHindi commented 8 years ago

No i dont use it in Lambda... its triggered in an api.

if you make the tempFile configurable and send it in a pull request I'll accept it

n7best commented 7 years ago

@d3m3tr1s @DanielHindi

https://github.com/DanielHindi/aws-s3-zipper/blob/master/index.js#L265 https://github.com/DanielHindi/aws-s3-zipper/blob/master/index.js#L303

I think this line assume all the file will be zip in that timeframe (1s) which might be fine for small zip files, zip.on('finish') would be a better solution to avoid corruption.

ref: https://archiverjs.com/docs/Archiver.html#finalize https://github.com/archiverjs/node-archiver/blob/master/examples/pack-zip.js

d3m3tr1s commented 7 years ago

Thank you for the tip, @n7best !

DanielHindi commented 7 years ago

The call back should only happen when the file has been zipped and released the problem is the module keeps a lock on the file for a few moments after the callback. Thats why i put a breathe time before moving on

n7best commented 7 years ago

https://github.com/DanielHindi/aws-s3-zipper/blob/master/index.js#L144 finalize or append from archiver does not guarantee the file is zipped.

DanielHindi commented 7 years ago

The end, close or finish events on the destination stream may fire right after calling this method so you should set listeners beforehand to properly detect stream completion.

Which means there must be a different event or callback that gives you the true release of the file

output.on('close', function() {
  console.log(archive.pointer() + ' total bytes');
  console.log('archiver has been finalized and the output file descriptor has closed.');
});

this might be it. need to test

n7best commented 7 years ago

I test all three, not sure why, only finish works for me. My files are pretty large.

DanielHindi commented 7 years ago

I believe 1.0.1 has the potential to fix your issue