adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.22k stars 476 forks source link

Why is Sharpcompress slow to compress large files #818

Open oufu99 opened 3 months ago

oufu99 commented 3 months ago

I used Sharpcompress to compress a folder into a zip file. There is only one file in this folder, which is a 1.8G mkv file. It took me 2 minutes to compress it. Why does it take so long? Is there a problem with my code or environment code: sharpcompress version : 0.36.0 using (var archive = ZipArchive.Create()) { archive.AddAllFromDirectory("E:\test\"); archive.SaveTo("E:\temp.zip", CompressionType.Deflate); }

btomblinson commented 3 months ago

I don’t care what file type it is 1.8G is massive. I’ve compressed a lot less that had similar performance times using 7Zip or native Windows zip util. Or try using VLC to convert that to a .mp4 and clock that. With that size of file and doing any type of CPU intensive task hardware is the most important factor.

If you upload the file I can try and benchmark it but unless @adamhathcock disagrees you need to provide more evidence that this library compresses that much slower

adamhathcock commented 3 months ago

I agree: a 1.8 file can take a long time. I'm not saying this library will be the fastest but any compression on a file that size will take a long.

This library allows more fine grained control and forward-only access which matters so you don't have to buffer files in memory. It's not going to be the fastest for raw time in compression, especially over a C++ implementation.

7zip as a format and library might be better for you as it allows multi-threaded compression but at the cost you cannot access the format in a forward-only manner.

btomblinson commented 3 months ago

An alternative would be to wrap the sharpcompress logic in a method and use Task and asynchronous programming in your app if compressing the file is blocking it. Sharpcompress itself does not have asynchronous methods but it can be wrapped inside one.

adamhathcock commented 3 months ago

I've been reluctant to try async methods because Streams are often not really implementing async and compression is CPU bound so doesn't help.

Putting things on it's own thread can help with percieved performance if you don't want to lock your UI or something.

oufu99 commented 3 months ago

I don’t care what file type it is 1.8G is massive. I’ve compressed a lot less that had similar performance times using 7Zip or native Windows zip util. Or try using VLC to convert that to a .mp4 and clock that. With that size of file and doing any type of CPU intensive task hardware is the most important factor.

If you upload the file I can try and benchmark it but unless @adamhathcock disagrees you need to provide more evidence that this library compresses that much slower

What confuses me is that using the same compression method to compress data of different sizes takes an exponential increase in time. For example, compressing 100M files only takes 1 second, 500M files takes 8 seconds, 1GB takes 30 seconds, and 2GB takes about 2 minutes

adamhathcock commented 3 months ago

In that case, there's probably something holding onto memory when it shouldn't be. Pooling might fix it.

using dotMemory or the like can reveal it.

abelbraaksma commented 1 month ago

@oufu99, you may be hitting paging issues (responding to your "exponential" comment). But there are so much variables at play with performance that is impossible to say unless you help us help you.

Please post a minimal repro here in code that shows what methods you're using for the timings, and gives your hardware setup. If we cannot repro it, is very hard to give any but the most general advice, I'm sure you'll understand.