bp74 / Zstandard.Net

A Zstandard wrapper for .Net
Other
135 stars 26 forks source link

Missing features / contributions welcome? #13

Open EamonNerbonne opened 5 years ago

EamonNerbonne commented 5 years ago

I noticed that this library and the related https://github.com/skbkontur/ZstdNet/tree/master/ZstdNet have only partially overlapping sets of functionality.

Are you interested in external contributions to fill out the gaps; and if so, how do you want those?

I could think of

TL;DR are you interested in contributions, and if so how/what kind/etc?

bp74 commented 5 years ago

Hi, i thought that training is mostly done with the console application that comes ith ZSTD. Do you think that this should be done with the .Net library? Regarding the Span feature - yes this would be nice, the new memory features will show up in pretty much all .Net libraries in the future.

EamonNerbonne commented 5 years ago

Well "should" - that depends on the use case :-).

But yeah, for me it would be nice. I'm intending to use this to compress documents in what's essentially a document-database, and that means that the dictionary is dynamic: it's going to be based on a sample of actual data; and there are likely going to be a bunch of dictionaries (clustered somehow, e.g. based on document type and/or client), and the dictionaries are likely to be occasionally regenerated (to adapt to changing data distributions or simply leverage the fact that time is a reasonable predictor for a compressor).

But even for a fixed database it's a little simpler if it's possible to use the same tool to train the data as to use it.

I mean, for some people this is purely a disadvantage, because it causes some amount of library bloat. But if you're really going to leverage the small-content advantages dictionaries provide you kind of want to be able to make dictionaries. The size bloat appears to be fairly simple, based on the fact that https://github.com/skbkontur/ZstdNet/tree/master/ZstdNet's version of the dll's are actualy much smaller than the current 1.3.8 dlls; and in any case if you really care about size then a more significant win is to pick a bit-ness rather than include 32 and 64bit both. But I haven't checked yet what the bloat is using the 1.3.8 version of the codebase.