facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
23.71k stars 2.11k forks source link

How to enable the checksum option #914

Closed schmiidt closed 6 years ago

schmiidt commented 6 years ago

I hope it is okay to post a question here (otherwise I apologize).

I am reading the zstd header and trying to understand how to add a checksum to the last compression block. I am using the same streaming context to compress multiple files in the following manner.

zcs = ZSTD_createCStream();

/* ZSTD_CCtx_setParameter(zcs, ZSTD_p_checksumFlag, 1); */

/* first file */
ZSTD_initCStream(zcs, 3);
ZSTD_compressStream(zcs, ...);
...
ZSTD_endStream(zcs);

/* second file */
ZSTD_initCStream(zcs, 3);
ZSTD_compressStream(zcs);
...
ZSTD_endStream(zcs);

/* nth file */
ZSTD_initCStream(zcs, 3);
ZSTD_compressStream(zcs);
...
ZSTD_endStream(zcs);

ZSTD_freeCStream(zcs);

Ideally, I would like to enable the checksum once right after ZSTD_createCStream, but I suppose that the subsequent ZSTD_initCStream would then overwrite the flag again with its default value. Is that correct?

There are several other variants of initCStream_* in the advanced section. I wonder if any of them would play nicely together with ZSTD_CCtx_setParameter(zcs, ZSTD_p_checksumFlag, 1) or do I need to use another API to enable the checksum in my use-case?

Cyan4973 commented 6 years ago

Indeed, ZSTD_initCStream() will change parameters to its own defaults.

The function associated with ZSTD_CCtx_setParameter() is ZSTD_compress_generic(). I think you can use it in your example. It becomes :

zcs = ZSTD_createCStream();

ZSTD_CCtx_setParameter(zcs, ZSTD_p_checksumFlag, 1);
ZSTD_CCtx_setParameter(zcs, ZSTD_p_compressionLevel, 3);

/* first file */
ZSTD_compress_generic(zcs, out, in, ZSTD_e_continue);
...
ZSTD_compress_generic(zcs, out, in, ZSTD_e_end);

/* nth file */
ZSTD_compress_generic(zcs, out, in, ZSTD_e_continue);
...
ZSTD_compress_generic(zcs, out, in, ZSTD_e_end);

ZSTD_freeCStream(zcs);

ZSTD_compress_generic() keep same parameters across compression sessions.

schmiidt commented 6 years ago

Great, I really like the new advanced compression API. It is much clearer and easier to use in practice. I get the feeling that you are moving away from the advanced streaming functions, so I just rewrote my example without using any of them.

cctx = ZSTD_createCCtx();

ZSTD_CCtx_setParameter(cctx, ZSTD_p_checksumFlag, 1);
ZSTD_CCtx_setParameter(zcs, ZSTD_p_compressionLevel, 3);

/* first file */
ZSTD_compress_generic(cctx, &output, &input, ZSTD_e_continue);
...
ZSTD_compress_generic(cctx, &output, &input, ZSTD_e_end);

/* second file */
ZSTD_compress_generic(cctx, &output, &input, ZSTD_e_continue);
...
ZSTD_compress_generic(cctx, &output, &input, ZSTD_e_end);

/* nth file */
ZSTD_compress_generic(cctx, &output, &input, ZSTD_e_continue);
...
ZSTD_compress_generic(cctx, &output, &input, ZSTD_e_end);

ZSTD_freeCStream(cctx);

At first, I was a little surprised to learn that the dictionary was reset after each ZSTD_e_end. I would not have expected that, but I see how it makes more sense this way. Would it be possible to let the dictionary evolve for let's say 64 files before resetting it back to its default state? Hope you do not mind this additional question, but it would help me provide a reasonable latency while still maintaining a high compression ratio.

Cyan4973 commented 6 years ago

I believe you are looking for ZSTD_e_flush command. This one will preserve the context, so past content can still be referenced to compress next data.

schmiidt commented 6 years ago

I have thought about that, but then I would lose the checksum for each file, right? Or does ZSTD_e_flush generate a checksum as well? I was under the impression that only ZSTD_e_end did.

Cyan4973 commented 6 years ago

You're right, only ZSTD_e_end generates a checksum, at the end of the frame. ZSTD_e_flush doesn't end a frame, so it cannot generate an "end of frame" checksum. Unfortunately, there is no "intermediate" checksum defined in the specification ...

schmiidt commented 6 years ago

OK, thanks for taking the time to answer my questions.