zstd filter: add support for parallel processing

marxin commented 1 year ago

As you probably know, zstd is famous for its fast (and stable) parallel compression capability. It would be great if iostream would teach the skill.

I've got a WIP parallel implementation of the io::multichar_output_filter filter: boost-zstd.cc.txt

marxin commented 1 year ago

@rdoeffinger

rdoeffinger commented 1 year ago

There are 3 main stumbling blocks:

it seems there is no API designed for how to set filter options in a generic way. While it's possible to just add a function for setting multithreading specifically, it would be better if it could support more, like setting compression levels in a standard way for all compression filters, allowing to set any new options that are added to zstd etc. Designing this is not easy
it seems zstd does not have an option for "choose a good amount of worker threads", it only accepts an explicit number. That makes it a pain to use and hard to provide a nice "enable multithreading" option
I don't have a use-case that benefits much from this, so I would be in no position to test if it works well and if there are any performance bottlenecks (e.g. too much flushing being added by default).

marxin commented 1 year ago

it seems zstd does not have an option for "choose a good amount of worker threads", it only accepts an explicit number. That makes it a pain to use and hard to provide a nice "enable multithreading" option

Yeah, one might want to use the auto-detection capability of the library: --auto-threads={physical,logical} (default: physical): When using a default amount of threads via -T0, choose the default based on the number of detected physical or logical cores.

I don't have a use-case that benefits much from this, so I would be in no position to test if it works well and if there are any performance bottlenecks (e.g. too much flushing being added by default).

Heh. I think every time you compress something bigger you want to have it done as fast as possible. That's why using gzip won't work as it's only single-threaded compression algorithm.

rdoeffinger commented 1 year ago

Yeah, one might want to use the auto-detection capability of the library

My understanding is that the library does not have this capability, only the binary. But if there is a library API to select this mode it would indeed be helpful.

Heh. I think every time you compress something bigger you want to have it done as fast as possible.

Not if the thing producing the data is producing it at a fairly leisurely pace. It just runs for long enough that you need compression. Also all cases where you have a lot of different things to compress are better handled by parallelism across the items to compress, not by parallelizing the compression. But anyway, not the main point and not an argument against the feature, just that this complicates developing and testing such a feature for me (plus, reduces motivation to do so).

marxin commented 1 year ago

Sure, I understand it's not an exciting use case for you. In my case, I stream data to a single file where I would like to apply compression. And that's why boost IO streams come in handy and it would help me to save a lot of time.

rdoeffinger commented 1 year ago

I had a bit of a look. The good news: it's easy to add support for extra parameters via the zstdparams struct. The bad news: the library functions to set parallel compression are only available from version 1.4. So would need to figure out how to handle that. While it's possible to automatically use these new functions when the header is 1.4 or newer, that would mean that a binary built against 1.4 will now no longer run on a system that has 1.3, whereas it would have worked fine before. Not sure if that is really a problem, but it's a risk. Then there is also the question, what to do if mutithreading was requested but turns out to not be available? There is the more ugly and hard-core option to just give access to the cstream and dstream_ variables, leaving it to the user to figure it out and set these options, but that seems like bad design.

marxin commented 1 year ago

Well, ZSTD 1.4 is more than 4 years old, so I would not spent much time with older ZSTD releases. And yes, if MT is not available (or disabled in libzstd), then single-threaded mode seems to me like a reasonable fallback.

boostorg / iostreams

zstd filter: add support for parallel processing #163