Missing features / contributions welcome?

EamonNerbonne commented 5 years ago

I noticed that this library and the related https://github.com/skbkontur/ZstdNet/tree/master/ZstdNet have only partially overlapping sets of functionality.

Are you interested in external contributions to fill out the gaps; and if so, how do you want those?

I could think of

adding API surface area corresponding to ZDICT_trainFromBuffer (this would be hugely useful to me, but may require a different compilation of libzstd.dll, since the prebundled release at https://github.com/facebook/zstd/releases aren't compiled with optional dictBuilder package.
adding API more suitable for (de)compressing small things; i.e. using Span<T> instead of Stream<T>, at least, under the presumption that benchmarks show this amounts to any kind of meaningful perf win.
perhaps there are other features the underlying C api that might be useful and simple enough to expose?
a little more tenuously, splitting the library into an as-thin-as-possible safe wrapper around the native library, and a wrapper that converts that into more conventional .net apis (akin to sqlitepcl.raw) - the advantage of that being that the "nice" managed wrapper need not evolve at quite the same rate as the underlying native library, and also, it's easier to expose all the crazy bits with a raw library without needing to decided on a clean api for them (i.e. this could be a way to include a dictbuilder simply), and conversely to allow experimenting with clean managed apis without polluting a library you want to keep stable and clean.

TL;DR are you interested in contributions, and if so how/what kind/etc?

bp74 commented 5 years ago

Hi, i thought that training is mostly done with the console application that comes ith ZSTD. Do you think that this should be done with the .Net library? Regarding the Span feature - yes this would be nice, the new memory features will show up in pretty much all .Net libraries in the future.

EamonNerbonne commented 5 years ago

Well "should" - that depends on the use case :-).

But yeah, for me it would be nice. I'm intending to use this to compress documents in what's essentially a document-database, and that means that the dictionary is dynamic: it's going to be based on a sample of actual data; and there are likely going to be a bunch of dictionaries (clustered somehow, e.g. based on document type and/or client), and the dictionaries are likely to be occasionally regenerated (to adapt to changing data distributions or simply leverage the fact that time is a reasonable predictor for a compressor).

But even for a fixed database it's a little simpler if it's possible to use the same tool to train the data as to use it.

I mean, for some people this is purely a disadvantage, because it causes some amount of library bloat. But if you're really going to leverage the small-content advantages dictionaries provide you kind of want to be able to make dictionaries. The size bloat appears to be fairly simple, based on the fact that https://github.com/skbkontur/ZstdNet/tree/master/ZstdNet's version of the dll's are actualy much smaller than the current 1.3.8 dlls; and in any case if you really care about size then a more significant win is to pick a bit-ness rather than include 32 and 64bit both. But I haven't checked yet what the bloat is using the 1.3.8 version of the codebase.

bp74 / Zstandard.Net

Missing features / contributions welcome? #13