MiloszKrajewski / K4os.Compression.LZ4

LZ4/LH4HC compression for .NET Standard 1.6/2.0 (formerly known as lz4net)
MIT License
675 stars 77 forks source link

Replace K4os.Hash.xxHash with System.IO.Hashing #80

Closed eerhardt closed 1 year ago

eerhardt commented 1 year ago

The XxHash32 is implemented in System.IO.Hashing now. Switching the usages over to this package.

The System.IO.Hashing 8.0-preview1 package has the HashToUInt32 and GetCurrentHashAsUInt32 APIs. Using it for now, until 8.0 is released. If we want to use a stable package, I can use the 7.0 package, but will need a few wrapper methods to implement this functionality.

@MiloszKrajewski - if we take this approach, would it make sense to continue having 2 separate NuGet packages for LZ4? Or could we combine them into a single NuGet package?

MiloszKrajewski commented 1 year ago

I did look at this before, but System.IO.Hashing.XxHash32 is a class not a struct so would create additional GC pressure while literally recent work was focused on removing GC pressure wherever I could. So, I would be reluctant. It I can find any actual benefit (for example, hypothetically: "despite additional GC pressure the performance is 10% better"), then I could consider it.

eerhardt commented 1 year ago

If it was changed to cache/pool an XxHash32 instance and reuse it, that would reduce the GC pressure. It wouldn't need to allocate. Would that be acceptable?

An actual benefit is applications that use the K4os.Compression.LZ4 library, and also need System.IO.Hashing because some other code/library uses it. The duplicate code bloats memory usage and size on disk.

MiloszKrajewski commented 1 year ago

If it was changed to cache/pool an XxHash32 instance and reuse it, that would reduce the GC pressure. It wouldn't need to allocate. Would that be acceptable?

Pooling is an option definitely, but I would still need to measure performance for it. There are potentially two benchmarks to be performed: a) 64k random buffer hashing only (so the speed of hashing itself) b) small LZ4 compression/decompression (impact of memory allocation)

Would you like to add such benchmarks?

An actual benefit is applications that use the K4os.Compression.LZ4 library, and also need System.IO.Hashing because some other code/library uses it. The duplicate code bloats memory usage and size on disk.

This is not a very compelling argument, to be honest. I mean, I was recently in some other performance focused project sacrificing 1MB of memory for 10% performance gain, so 13KB if disk space is not really a good argument.

if we take this approach, would it make sense to continue having 2 separate NuGet packages for LZ4? Or could we combine them into a single NuGet package?

I do have an OCD related to dependency purity. I actually have some regrets having dependency on System.IO.Pipelines and I was thinking about moving it to separate assembly (but it turned out was not a trivial task and would require some trade-offs).

It would also be a little bit self-contradictory, justifying it as "removing dependency" to immediately add a bigger one. They are System.* ones, but they are still just nuget packages to be downloaded, AFAIU.

Anyway, the focus of this library is performance (with occasional compromises), so that's a thing to measure and it will drive decisions.

eerhardt commented 1 year ago

Closing this PR until performance numbers can prove it adds value.