Open haampie opened 2 years ago
@haampie Could this be handled by a generic wrapper around ZSTD which implements the observer protocol and executes zstd appropriately under the hood?
At API integration level, libzstd
offers a concept of shared threadpool, which makes it possible to limit the total nb of threads used for compression, whatever the nb of concurrent requests.
However, if your question is about invoking the zstd
CLI,
then indeed, it seems the solution should rather come from a wrapper above zstd
,
which would control the total nb of instances active at the same time
(like make
controls the nb of active gcc
instances).
That being said, it seems you are looking for something even more complex, which would not only control the total nb of instances active at the same time, but also allow each instance to use multiple threads, dynamically resized depending on workload, and yet ensure that the total nb of threads across these instances doesn't overwhelm a threshold.
First, let's note that this is not a request that GNU Make could achieve. It only controls a total nb of instances, with the naive assumption that each instance only uses one thread. That limited version is likely achievable, using an intermediate controller program (or python script).
Now, ensuring a total nb of active threads across multiple workloads is something that the shared threadpool model mentioned above could do. But then again, it's a totally different integration model, unsuitable for the command line interface experience.
It seems you are looking for a fairly advanced functionality, and while it can be imagined and even done, it still has to be done.
That's a non-trivial amount of work that someone would have to put in.
Is your feature request related to a problem? Please describe.
When running multiple
zstd
processes in parallel, it's difficult to know how many threads to give to each process. Giving each process as many threads as available results in oversubscription, given them only 1 may not be optimal use of resources.Describe the solution you'd like
GNU Make has this concept of a jobserver, which ensures that
make -j <n>
doesn't run more thann
jobs in parallel (recursively). It is widely supported (e.g. gcc's LTO spawns parallel jobs from gcc respecting the job server, clang, cargo, ...) and is trivial to implement: https://www.gnu.org/software/make/manual/html_node/POSIX-Jobserver.html.This way,
make -j n
can be used to control the concurrency, and anyzstd
process could automatically take as many jobs as available to compress / decompress.Additional context
This is particularly useful when Makefiles (or any other program conforming to the GNU Make jobserver) are used to create and compress multiple tarballs.