facebook / zstd

Zstandard - Fast real-time compression algorithm
http://www.zstd.net
Other
23.58k stars 2.1k forks source link

[Huffman Assembly] Expand supported platforms #2789

Open terrelln opened 3 years ago

terrelln commented 3 years ago

The x86-64 Huffman assembly implementation is currently allow-listed to work on:

The macro to check this is HUF_ASM_SUPPORTED defined in huf_decompress.c.

I've selected a very restricted set of platforms because I want to be sure that it is well tested before enabling it. Any platform that shares a calling convention with Linux should just work, but I want to be cautious of details, like working with interrupt handlers & debuggers. It should be easy to add BSD to this list, if someone wants to test that everything works as expected.

The assembly currently doesn't work on Windows because it has a different calling convention. And I had a really hard time getting the detection macros right. So if someone wants to take that on, that would be great.

Tasks:

ghost commented 3 years ago

huf_decompress_amd64.S is written in AT&T syntax. Gcc's GAS (GNU Assembler) supports AT&T syntax.

Visual Studio's MASM (Microsoft Macro Assembler) only supports Intel syntax, so it may need to be rewritten to use it on Windows.

FYI, isa-l uses NASM to compile assembly code (Intel syntax), then isa-l supports gcc, clang, icc and VC compiler.

GAS also supports Intel syntax, maybe it's better to switch to Intel syntax in the long run.

Many x86 assemblers use Intel syntax, including NASM, FASM, MASM, TASM, and YASM. GAS, which originally used AT&T syntax, has supported both syntaxes since version 2.10 via the .intel_syntax directive.

Of course, this needs to be decided based on the work of your team.

gvanem commented 2 years ago

I have an error while building for x86 and hit this issue. Basically, I just did:

cd build\cmake
md build
cd build
cmake -G "NMake Makefiles" ..
nmake -nologo
...
LINK : fatal error LNK1104: cannot open file
'CMakeFiles\libzstd_shared.dir\F_\MingW32\src\Compression\zstd\lib\decompress\huf_decompress_amd64.S.obj'

How on earth does HUF_ENABLE_ASM_X86_64_BMI2 gets defined on x86 in this #ifdef soup:

#if !defined(HUF_DISABLE_ASM) &&                                  \
    HUF_ASM_SUPPORTED &&                                          \
    defined(__x86_64__) && (DYNAMIC_BMI2 || defined(__BMI2__))
# define HUF_ENABLE_ASM_X86_64_BMI2 1
...

And no reference to huf_decompress_amd64.S in the CMakeLists.txt files AFAICS. So how does this happen?

gvanem commented 2 years ago

And no reference to huf_decompress_amd64.S in the CMakeLists.txt files AFAICS.

Yes there is (via file(GLOB ...)). Fix that worked for me:

--- a/build/cmake/lib/CMakeLists.txt 2021-10-27 10:29:45
+++ /build/cmake/lib/CMakeLists.txt 2021-11-22 13:15:55
@@ -22,7 +22,13 @@

 file(GLOB CommonSources ${LIBRARY_DIR}/common/*.c)
 file(GLOB CompressSources ${LIBRARY_DIR}/compress/*.c)
-file(GLOB DecompressSources ${LIBRARY_DIR}/decompress/*.c ${LIBRARY_DIR}/decompress/*.S)
+
+if (WIN32)
+  file(GLOB DecompressSources ${LIBRARY_DIR}/decompress/*.c)
+else()
+  file(GLOB DecompressSources ${LIBRARY_DIR}/decompress/*.c ${LIBRARY_DIR}/decompress/*.S)
+endif()
+
 file(GLOB DictBuilderSources ${LIBRARY_DIR}/dictBuilder/*.c)
terrelln commented 2 years ago

GAS also supports Intel syntax, maybe it's better to switch to Intel syntax in the long run.

In the medium to long term we will look into switching to Intel syntax, if it is better supported.

Fix that worked for me:

That looks good to me! If you would like to put up a PR with that change I will accept it. Otherwise, I will put up a PR with the fix before the next release.

gvanem commented 2 years ago

Otherwise, I will put up a PR with the fix before the next release.

Please do since I suck at Git + PR.

terrelln commented 2 years ago

@gvanem HUF_ENABLE_ASM_X86_64_BMI2 definitely isn't enabled on x86 32-bit builds. I've added tests for Windows + CMake builds to our CI, and don't see any failures. I've further restricted the assembly code to be disabled whenever possible. Do you still see issues on the latest dev branch?

gvanem commented 2 years ago

Do you still see issues on the latest dev branch?

You mean with CMake? I'm not a great lover of CMake; I use my own home-brew GNU Makefile instead.

terrelln commented 2 years ago

@gvanem I mean regarding this failure:

LINK : fatal error LNK1104: cannot open file
'CMakeFiles\libzstd_shared.dir\F_\MingW32\src\Compression\zstd\lib\decompress\huf_decompress_amd64.S.obj'

I haven't been able to reproduce the failure, but I've done more to ensure that the Huffman assembly is ifdef-ed out whenever it isn't used.

shuffle2 commented 2 years ago

I strongly think that the assembly file should be moved to intrinsics. You get real cross-platform support without ugly hacks and/or duplicated code (and i wouldn't be surprised if the perf is actually better).

terrelln commented 2 years ago

I strongly think that the assembly file should be moved to intrinsics.

What do you mean by intrinsics? Like compiler builtins e.g. bextr?

You get real cross-platform support without ugly hacks and/or duplicated code (and i wouldn't be surprised if the perf is actually better).

You're definitely welcome to try! If we can get the same or better perf with C, we'd definitely switch back.

shuffle2 commented 2 years ago

What do you mean by intrinsics? Like compiler builtins e.g. bextr?

Functions that you find pulled in from e.g. x86intrin.h (non-msvc) or intrin.h (msvc). It does seem that, unfortunately, gcc does not expose an intrinsic for some parts of BMI2. Their own test for shlx relies on using -O2 and having the optimization pass emit shlx, which is, IMO, insane. The BMI2-related gcc header is here, altho they tend to have intrinsics sprinkled across many files. This can be worked around, but it's a poor choice on gcc's behalf.

You're definitely welcome to try! If we can get the same or better perf with C, we'd definitely switch back.

Can you give me some idea what compilers (e.g. min version of gcc/clang/msvc, others) you currently support? Also, any pointers for good perf test to run that checks this code specifically?

As I look through more of the code, there's also a pattern like: if (DYNAMIC_BMI2 && bmi2) { call inlined <function>_bmi2 }, where <function>_bmi2 has been decorated with function attribute to enable compiler to emit bmi2(+related) instructions for that function. This is also non-ideal, as msvc will not be able to use BMI2 is such cases (I started looking into this because I want the fast code on msvc), and you're just hoping the compiler optimizes the code into the BMI2 set (the compiler will probably not pick optimized instructions with debug flags/lower optimization levels). Generally, for this style of ISA enablement, you need to set the compiler flag on the file level instead of on the function to enable compatibility across compilers (e.g. clang/arm64 also does not like to enable certain extensions unless passed as cmdline arg - function attribute alone is not sufficient).

Anyway, I'll try to transliterate the .S to .c and see how ugly/fast it is or not :)