facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
22.77k stars 2.03k forks source link

[zstd][dict] Ensure that dictionary training functions are fully reen… #4086

Closed Adenilson closed 1 day ago

Adenilson commented 3 days ago

…trant

The two main functions used for dictionary training using the COVER algorithm require initialization of a COVER_ctx_t where a call to qsort() is performed.

The issue is that the standard C99 qsort() function doesn't offer a way to pass an extra parameter for the comparison function callback (e.g. a pointer to a context) and currently zstd relies on a global static variable to hold a pointer to a context needed to perform the sort operation.

If a zstd library user invokes either ZDICT_trainFromBuffer_cover or ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the global context may be overwritten before/during the call/execution to qsort() in the initialization of the COVER_ctx_t, thus yielding to crashes and other bad things (Tm) as reported on issue #4045.

Enters qsort_r(): it was designed to address precisely this situation, to quote from the documention [1]: "the comparison function does not need to use global variables to pass through arbitrary arguments, and is therefore reentrant and safe to use in threads."

It is available with small variations for multiple OSes (GNU, BSD[2], Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as part of the . Let's hope that compilers eventually catch up with it.

For now, we have to handle the small variations in function parameters for each platform.

The current fix solves the problem by allowing each executing thread pass its own COVER_ctx_t instance to qsort_r(), removing the use of a global pointer and allowing the code to be reentrant.

Unfortunately for *BSD, we cannot leverage qsort_r() given that its API has changed for newer versions of FreeBSD (14.0) and the other BSD variants (e.g. NetBSD, OpenBSD) don't implement it.

[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html [2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r [3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170 [4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

Adenilson commented 3 days ago

This is the latest version for #4078.

Adenilson commented 3 days ago

Main changes are: a) Reuse the same code for both WIN + APPLE when possible. b) Only use WIN_CDECL where needed. c) Rebased all changes in a single patch. d) Better comments on zstd_deps.h.

Adenilson commented 3 days ago

Let's see how the CI bots digest the latest patch version. :-)

Adenilson commented 3 days ago

@Cyan4973 thanks a lot for the review and comments, I really appreciate it.

I was pleasantly surprised how the CI works (i.e. it really helps a lot to catch issues early on) and how good is the zstd code base (i.e. it is well written and things actually make sense in general).

I will start working on the next patch. :-)