brendan-duncan / archive

Dart library to encode and decode various archive and compression formats, such as Zip, Tar, GZip, ZLib, and BZip2.
MIT License
399 stars 139 forks source link

Fixed InputFileStream taking a memory footprint of 1MB x files count #285

Closed androidseb closed 11 months ago

androidseb commented 11 months ago

Fixing a bug introduced in version 3.4.3 with https://github.com/brendan-duncan/archive/commit/5699f93e49b4c091646b2348b03d49eb8bc54d88: this change makes InputFileStream.clone return a 1MB Uint8List, while ZipDirectory.read calls InputFileStream.clone and stores the result for each file in the archive. Because of that, attempting to read an archive containing 100k files would have a 100GB memory footprint.

List of the changes I made:

androidseb commented 11 months ago

@brendan-duncan FYI the test adds a 9MB zip file, along with a test taking about 2 full second on my machine, I thought it was the best way to secure regressions. If you feel like the test is too heavy, we could limit it to only testing the number of files, or even remove it completely...

brendan-duncan commented 11 months ago

Thanks for catching that!

brendan-duncan commented 11 months ago

I'll look into rewriting the file stream class to use a shared file buffer

brendan-duncan commented 11 months ago

I have the input file stream rewritten to use a shared file buffer now. Just doing a bunch of tests to make sure I didn't mess it up and it improves things.

brendan-duncan commented 11 months ago

Rewritten version using a shared file buffer has been pushed and published. I checked your 100k test and it ran about 2 seconds with your fix, and less than 1 second with the new version, on my computer.

androidseb commented 11 months ago

Hey @brendan-duncan, just FYI the version 3.4.5 you mentioned is not published on pub.dev yet, only in your changelog for now :-)

brendan-duncan commented 11 months ago

I guess I quit right before the finish line :-). Published.

androidseb commented 11 months ago

After testing 3.4.5, I wanted to post back here to provide insights about the performance improvements I have noticed - I have been testing in a beta version of my Flutter app on an iPad, importing a 5GB KMZ file containing 50k images:

So in that real-world device example, having that shared buffer seems to significantly boost the zip index parsing (halved the processing time there!) and also decreases the time required to extract zip files by 15%.

Nice work !!! :rocket:

brendan-duncan commented 11 months ago

That's awesome, thanks for letting me know. With so many devices and platforms with their own limitations, it's hard to know if any change will help or hurt.