brendan-duncan / archive

Dart library to encode and decode various archive and compression formats, such as Zip, Tar, GZip, ZLib, and BZip2.
MIT License
399 stars 139 forks source link

DO NOT USE THIS PACKAGE FOR ZIPPING LARGE FILES! #277

Closed BananaMasterz closed 11 months ago

BananaMasterz commented 1 year ago

This package seems to be crashing internally without any indication or error for large files. This is not documented anywhere and is very dangerous.

encoder.zipDirectory(Directory(dirToZip), filename: zipResult);

The following will create a corrupt zip file without letting you know if the thing you are trying to zip is larger than some amount (some say 4GB)

While I appreciate the dev, this is Very dangerous!

brendan-duncan commented 1 year ago

This is probably because it's encoding to the old zip format with a 4GB limit due to the limitation of the zip format, and FAT32, at that time. The encoder would have to be rewritten to use the zip extension that allows much larger directory sizes.

I doubt I'll be able to get to it any time soon, but I'll try.

Using a dart library to encode 10GB seems pretty extreme, though I get it's what you want to do for your situation.

BananaMasterz commented 1 year ago

I do not expect you to rewrite the package for sure! Thank you for making it in the first place. I was just surprised mainly by the fact that I didn't see this limit first thing in the documentation or at all. I didn't actively think to test larger files and it's very dangerous to not throw an error.

In my scenario if I didn't accidentally try to test a 10GB file I would have used this package to zip and send files to a server and then delete the original. This would have caused data loss since no error would be thrown. For now adding the limit to pubdev description and the documentation should be enough for educating the user. I would also raise an error instead of making it look like all went well. Again, I'm not expecting, just suggesting. The title of this issue is also a bit aggressive only cuz of the severity of the issue that can occur (data loss) if not careful.

brendan-duncan commented 1 year ago

I'll see if I can check for and error in this case. It didn't do so before because I never thought to check for it. I wish I had more time to work on this library, but I do what I can. Maybe it won't be too hard to add in the extended zip format for the encoder, I'll check on that too.

brendan-duncan commented 1 year ago

I started looking into this and tried manufacturing a test case to generate the problem, but my quick and dirty experiment hasn't been successful.

My test generated 11GB worth of data files (11k files of 1MB size), ziped and unziped the directory and it didn't result in any error.

Are there individual files that are large?

BananaMasterz commented 1 year ago

The individual file yes.

Run the following in a cmd to generate a large file

fsutil file createnew test 10000000000

After generating the file try zipping it.

In my case I tried to zip a directory that had an already zipped item inside and that's why it was that large. Otherwise it's pretty rare to have individual files that large. (Unless you're a video editor I guess)

brendan-duncan commented 12 months ago

I think I should be able to get this resolved, but it may take a little while, lot of family stuff going on keeping me busy.

nabil-hfz commented 12 months ago

@brendan-duncan

Hi dear, becuase I am using this library as well (only for small files) and recently I have worked intensively with system files in linux, I've got good basics dealing with files. I would like to offer helping you in refactoring it if you want as I have some free time now.

brendan-duncan commented 12 months ago

@nabil-hfs PRs are always welcome and very much appreciated. If you want to take a look at fixing up the zip file size limit issues, you are more than welcome to.

The basic zip format is https://docs.fileformat.com/compression/zip. As you can see in that doc, the compressed and uncompressed size fields for the local file header is 4 bytes, which means it has a 4GB limit. The ZipEncoder is just using the basic zip format currently, and not doing any error checking to see if the size of the file is in fact less than 32-bit.

The Zip64 extension was added to get around that limit, and is described in https://en.wikipedia.org/wiki/ZIP_(file_format). The fix for this issue would be add a Zip64 extension block to the Extra field of the local file header, since the Zip64 extension block can have 4 or 8 bytes for the size fields.

I can hopefully get to it soon, I don't think it should be too hard.

brendan-duncan commented 11 months ago

I still need to do more testing, but I think I got zip64 encoding working. You can test it with the git version, I'll publish after I test some more.

It should also reduce the memory used when encoding files to because before it had to read in the whole file to calculate the crc32 and write it to disk, and now it will do that in smaller chunks.

brendan-duncan commented 11 months ago

I tested it with zipping an 11GB file and it seemed to work

PerArneng commented 11 months ago

Really appreciate this work @brendan-duncan 💯

BananaMasterz commented 11 months ago

That's great, thanks a lot for your time and achievement. I know it will help a lot of people. I also appreciate how quickly you did it.

Thanks!

(You can close the issue anytime with the updated version as reference, otherwise I can close it)

brendan-duncan commented 11 months ago

Published updates in 3.4.0.