Open EbiArnie opened 8 months ago
This is because the parallel_compression example builds against the static library, which builds by default with multi-threading off.
You can get it working like so, from the examples
directory:
make -j12 -C ../../../lib clean libzstd.a CPPFLAGS="-DZSTD_MULTITHREAD"
make parallel_compression
./parallel_compression some_large_file 100000 12
There's honestly a lot of problems with this example:
job->done
and checking jobs[i].done
.-j12
or CPPFLAGS to make parallel_compression
, because the makefile builds the library using an entirely different make invocation.To fix these:
$(MAKE)
instead of make
.@Cyan4973 I'd be happy to look into this further, but I'd like guidance on what is/isn't welcome in this repo before I start if you don't mind. (I've written one of the most performant parallel-xz implementations, if you need proof I can do this.) Is there willingness to investigate adding --seekable
to the standard zstd binary? If not, what sorts of changes to the example would feel ok to you, and which wouldn't?
Thanks!
Hey, thanks for the detailed analysis.
Having --seekable
be part of default zstd would be very nice indeed.
Meanwhile, I'm using exactly the tool you linked to, t2sz.
As far as this example goes - it would be good to have working code of course.
Yes, I think it's a good idea to start working on this extension.
--seekable
seems like a nice way to express it.
I'm slightly concerned about the general complexity of the complete solution, but as long as it can be broken down into smaller incremental steps of tractable complexity, I'm sure it will be possible to progress regularly towards the goal.
Yes, I'm a little concerned about the complexity as well. A few reasons:
ZSTD_compressStream2()
, which the docs at least imply compresses just one frame.The options I see for expressing seekable mode are:
ZSTD_c_frameSize=BYTES
. Then make ZSTD_compressStream2 and friends just write frame headers/footers at the right places. We just re-use our existing concurrency model. The docs change to make it clear that compressStream2 can output multiple frames.
a. There are a few potential variants of this, eg: making the parameter "number of blocks in a frame", or reusing jobSize as the frame size.
b. getFrameProgression()
becomes very poorly named now, ugh. Maybe we add an alias getProgression()
?I'm partial to option 1.
The current seekable format is designed around the idea that each independent chunk is a frame. So yes, by design, it's a multi-frames topic.
Changing ZSTDMT_*
to output multiple frames instead of a single one is technically possible.
It's not even that difficult to imagine: just close and restart a frame at decided breakpoints.
It introduces a few more questions about the nature of the interface, because now additional information (boundaries between frame, or eventually content size within each frame) becomes useful to transmit (to write the jump table at the end for example).
But it's also a high risk operation, because this code is used to compress everything from the zstd
CLI. So a subtle vulnerability introduced here would have a pretty huge blast radius.
Option 3 has the advantage to contain the risks to the newly added code only, with less chances to impact normal operations.
A possible intermediate could be to duplicate zstdmt_compress
in order to make all the modifications there, eventually taking over the original one if it manages to become a clear and safe superset. Maybe it's what you called Option 2.
Ok, I will try that approach. I'm going on vacation soon, so maybe in 6 weeks I'll back with more :)
Sounds promising. I would still be happy with --seekable even without parallelism, but it sounds like you have it mostly figured out already. Thanks!
Describe the bug The parallel compression example does not seem to be working in parallel. It runs on a single thread only, regardless of how many threads are requested.
To Reproduce
./parallel_compression data.txt $((2*1024*1024)) 4
data.txt can be any sufficiently large file so that compression takes a handful of seconds. In my case, this is ASCII data, 2.8 GB.Expected behavior Parallel execution should use multiple cores and be faster than single threaded. The full
zstd
executable works as expected; given the -T option it is faster than without and saturates all cores.Also, I'd love for --seekable to become a standard option for
zstd
, but that should probably be a separate issue.Desktop: