brendan-duncan / archive

Dart library to encode and decode various archive and compression formats, such as Zip, Tar, GZip, ZLib, and BZip2.
MIT License
403 stars 140 forks source link

InputFileStream bufferSize is always 8 bytes #283

Closed grundid closed 1 year ago

grundid commented 1 year ago

Dealing with large files I noticed a very slow read operation of the InputFileStream. By looking at the buffer init code it looks like the buffer size is always 8 bytes or less:

    _buffer = Uint8List(min(bufferSize, 8));

By changing min to max the issue can be easily fixed. ;)

brendan-duncan commented 1 year ago

Well, that's embarrassing :-) Thanks for catching that, fix coming shortly.

grundid commented 1 year ago

Thanks for looking into it.

I would like to suggest a larger default buffer size. Maybe 0,1% of the file size. This would always result in max 1000 I/O operations. I'm working on a file that is 60GB large and it took over 1h to process the file with this 8 byte buffer. I then increased the buffer size to 50MB and the whole file was processed within a few minutes.

brendan-duncan commented 1 year ago

Yeah, file IO in Dart is really slow.

A 60GB file processed in Dart! I definitely didn't imagine people doing that when it was written :-). But I'm definitely seeing people dealing with larger files now.

I pushed the fix to 3.4.3. I increased the default buffer size to 1MB for now, until I can have a better sense of how much memory is too much memory for Dart.

brendan-duncan commented 1 year ago

There were some issues with the last version, so I pushed a new version to git and will publish soon. The old version had a file buffer for every InputFileStream, even the sub-file streams, so a zip with 100k files would have 100k * 1MB cache. The new version uses a shared file cache.

brendan-duncan commented 1 year ago

Published in 3.4.5.