cryptimeleon / mclwrap

A wrapper to bring the mcl pairing library into cryptimeleon
BSD 3-Clause "New" or "Revised" License
3 stars 1 forks source link

Enable additional performance optimization options for libmcljava #40

Closed WorldofJARcraft closed 2 years ago

WorldofJARcraft commented 2 years ago

Summary

The install_fast_mcljava_linux_mac.sh script - as shipped by mclwrap currently - relies on the libgmp that is provided by the system. Because this libgmp has to work on various CPUs, it cannot be tailored to the CPU it is currently running on, and will restrict itself to using portable instructions instead of utilizing everything the current CPU supports (e.g. AVX, AES-NI, etc.).
Since the script aims at generating a library for the local machine only, I extended it to be able to:

What I tested

What I did not test

JanBobolz commented 2 years ago

Hey, thanks!

Unfortunately, I cannot build mcl ffi with the build script on my macOS machine.

/tmp/mcl/ffi/java/../../include/mcl/gmp_util.hpp:37:10: fatal error: 'gmpxx.h' file not found
#include <gmpxx.h>
         ^~~~~~~~~
1 error generated.

when executing line 89 here.

The file gmpxx.h exists in /usr/local/include/gmpxx.h for me (it seems to have been put there by the gmp install script), but it's not picked up when building mcl ffi (doesn't seem to be a problem when building mcl itself). It does say

-- GMP found - linking it.
-- MCL_LIBRARY=/tmp/mcl/build/lib/libmcl.a GMP_LIBRARY=/usr/local/lib/libgmp.dylib

when cmaking ffi, but I guess the headers are not loaded?!

I'm not an expert in this whole cmake stuff you set up, so I thought I'd ask before continuing to blindly debug this.

WorldofJARcraft commented 2 years ago

Hey, thanks!

Unfortunately, I cannot build mcl ffi with the build script on my macOS machine.

/tmp/mcl/ffi/java/../../include/mcl/gmp_util.hpp:37:10: fatal error: 'gmpxx.h' file not found
#include <gmpxx.h>
         ^~~~~~~~~
1 error generated.

when executing line 89 here.

The file gmpxx.h exists in /usr/local/include/gmpxx.h for me (it seems to have been put there by the gmp install script), but it's not picked up when building mcl ffi (doesn't seem to be a problem when building mcl itself). It does say

-- GMP found - linking it.
-- MCL_LIBRARY=/tmp/mcl/build/lib/libmcl.a GMP_LIBRARY=/usr/local/lib/libgmp.dylib

when cmaking ffi, but I guess the headers are not loaded?!

I'm not an expert in this whole cmake stuff you set up, so I thought I'd ask before continuing to blindly debug this.

It looks like /usr/local/include not being in the default C++ include path on Mac (it is on Linux and the BSDs, however). I have pushed a revision of the script that adds /usr/local/include to the C++ includepath, I hope this will fix it. Can you test again with the new version? An alternative would be to install gmp to /usr/{include,lib}, however this might overwrite a system version of gmp and be incompatible to software that uses the old version, and I would prefer the previous solution. On another note, I noticed that the script might not work on arm64 since clang does not support -march=native on arm64. I fixed that as well by using -mcpu=native on aarch64 if clang is present (verified on a Raspberry Pi with FreeBSD).

JanBobolz commented 2 years ago

I have pushed a revision of the script that adds /usr/local/include to the C++ includepath, I hope this will fix it. Can you test again with the new version?

That indeed fixed the gmp linking issue. I also had to include the -I /usr/local/opt/openssl/include flag on macOS so that openssl could be linked, too. Do you have an intuition as to why the cmake process seems so much less able to find libraries than the old mcl-Makefile process we used to run in these scripts?

Performance for the newly compiled mcl seems about the same as the old for the limited tests I've run. But in general, this pull request seems like a good idea, so let's move forward with it.

WorldofJARcraft commented 2 years ago

In the Makefile-based build system, C flags like include paths are specified manually in common.mk per operating system (configuration).
CMake attempts to determine these paths dynamically, based on built-in rules for compilers, operating systems, etc, and might not detect libraries and headers in non-standard locations.
On the other hand, CMake can usually auto-detect configurations even for platform the code was not manually set up for, and usually requires less maintenance.

JanBobolz commented 2 years ago

I've tested the new binaries a bit more and compared them to the "slow" ones packaged with mclwrap. And shockingly, the packaged binaries seem to outperform the new ones consistently. For example in this test (but with 1000 iterations to make the numbers more stable).

For example: New:

Join
User: 3 ms
Provider: 4 ms
Earn
User: 4 ms
Provider: 5 ms
Spend
User: 11 ms
Provider: 18 ms

Packaged:

Join
User: 3 ms
Provider: 3 ms
Earn
User: 3 ms
Provider: 4 ms
Spend
User: 9 ms
Provider: 17 ms

Any idea what's going on there? Can you reproduce this?

WorldofJARcraft commented 2 years ago

For the portable binaries, I disabled linking against GMP (since it is a potential compatibility problem) and used mcl's vint instead (which works without external dependencies). For the "fast" script, I linked against GMP since it is the default for mcl and I assumed it would be faster. It is possible that this is not the case on every system, though, or that they are sometimes equal within the margin of error.
I ran the test on my Xeon E5620 system (2010 server CPU) and the "fast" library that linked GMP was significantly faster:

With gmp mcljava:

Join
User: 19 ms
Provider: 9 ms
Earn
User: 19 ms
Provider: 27 ms
Spend
User: 46 ms
Provider: 53 ms

With packaged mcljava:

Join
User: 24 ms
Provider: 14 ms
Earn
User: 31 ms
Provider: 48 ms
Spend
User: 52 ms
Provider: 93 ms
JanBobolz commented 2 years ago

I ran the test on my Xeon E5620 system (2010 server CPU) and the "fast" library that linked GMP was significantly faster:

Alright, cool! Thanks for testing that. So overall, this change seems to be positive for performance.
We'll do another test on another machine, then I think this is ready to be merged. We may later want to allow users to change to vint on machines where that is faster.

feidens commented 2 years ago

Hi, I'm the silent observer of this nice work on the scripts and the owner of another machine. Therefore, I'm sorry to report that the new script does not work on M1 (Pro). Before the changes it worked just fine. What ever this means.

A short excerpt.

----- Building mcl -----
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/cc
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found GMP: /opt/homebrew/Cellar/gmp/6.2.1_1/include (found version "6.2.1") 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/mcl/build
Scanning dependencies of target mcl
[  8%] Building CXX object CMakeFiles/mcl.dir/src/fp.cpp.o
[ 16%] Building ASM object CMakeFiles/mcl.dir/src/asm/arm.s.o
/tmp/mcl/src/asm/arm.s:2:2: error: unknown directive
 .syntax unified
 ^
/tmp/mcl/src/asm/arm.s:3:2: error: unknown directive
 .eabi_attribute 67, "2.09" @ Tag_conformance
 ^
/tmp/mcl/src/asm/arm.s:4:2: error: unknown directive
 .eabi_attribute 6, 1 @ Tag_CPU_arch
 ^
/tmp/mcl/src/asm/arm.s:5:2: error: unknown directive
 .eabi_attribute 8, 1 @ Tag_ARM_ISA_use

Maybe also of interest is the error that is returned at the end, but it seems rather natural:

CMake Error at CMakeLists.txt:42 (find_library):
  Could not find MCL_LIBRARY using the following names: mcl
feidens commented 2 years ago

I have the feeling skipping the CommandLineTools is not a good idea actually on an M1 Mac.

For completeness this is the output if I run the fast mac script on dffd151.

----- Building mcl -----
Makefile:219: warning: overriding commands for target `src/base64.ll'
Makefile:216: warning: ignoring old commands for target `src/base64.ll'
c++ -o src/gen src/gen.cpp -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -DMCL_USE_VINT -DMCL_VINT_FIXED_BUFFER
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/fp.cpp -o obj/fp.o -MMD -MP -MF obj/fp.d
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/bn_c256.cpp -o obj/bn_c256.o -MMD -MP -MF obj/bn_c256.d
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/bn_c384.cpp -o obj/bn_c384.o -MMD -MP -MF obj/bn_c384.d
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/bn_c384_256.cpp -o obj/bn_c384_256.o -MMD -MP -MF obj/bn_c384_256.d
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/bn_c512.cpp -o obj/bn_c512.o -MMD -MP -MF obj/bn_c512.d
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/she_c256.cpp -o obj/she_c256.o -MMD -MP -MF obj/she_c256.d
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/she_c384.cpp -o obj/she_c384.o -MMD -MP -MF obj/she_c384.d
c++ -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1 -c src/she_c384_256.cpp -o obj/she_c384_256.o -MMD -MP -MF obj/she_c384_256.d
src/gen -u 64 -f src/func.list > src/base64.ll
llvmVer=0x70
ar r lib/libmclbn256.a obj/bn_c256.o
ar: creating archive lib/libmclbn256.a
ar r lib/libmclbn384.a obj/bn_c384.o
ar: creating archive lib/libmclbn384.a
ar r lib/libmclbn384_256.a obj/bn_c384_256.o
ar: creating archive lib/libmclbn384_256.a
ar r lib/libmclbn512.a obj/bn_c512.o
ar: creating archive lib/libmclbn512.a
c++ -c src/base64.ll -o obj/base64.o -g3 -Wall -Wextra -Wformat=2 -Wcast-qual -Wcast-align -Wwrite-strings -Wfloat-equal -Wpointer-arith -Wundef  -I include -I test -fomit-frame-pointer -DNDEBUG -fno-stack-protector -O3  -DMCL_USE_VINT -DMCL_DONT_USE_OPENSSL -fPIC -DMCL_USE_LLVM=1
clang: warning: argument unused during compilation: '-I include' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-I test' [-Wunused-command-line-argument]
warning: overriding the module target triple with arm64-apple-macosx12.0.0 [-Woverride-module]
WorldofJARcraft commented 2 years ago

The error on Apple Silicon seems to be caused by an omission in the upstream CMakeLists.txt from mcl.
OSX on arm seems to self-identify as "arm64" instead of the formally correct "aarch64" (different name for the same ISA like amd64 vs. x64 vs. x86_64 on IA32), and the CMakeLists.txt checks only for "aarch64". Since "arm64" contains "arm", CMake continues to use the arm (32 bit) assembly in the next else-if, which uses some directives that do not work on aarch64, causing the compilation to fail. On FreeBSD / aarch64 (where I tested), the name "aarch64" is used, so the correct assembly listing is chosen.
The Makefile apparently fixed this in the support m1-mac commit, however the CMakeLists.txt was seemingly forgotten.
I have pushed a new version of the install script that uses my fork of mcl, where I attempted to fix this. Also, I added more debug output.
@feidens Can you try building again using the latest script?
If I am correct, the build should continue, and I will send a PR to mcl with the fix. In this case, the problem is not related to my changes.
Regarding the -Woverride-module: That seems to be a mostly llvm/clang-specific complaint. When clang is used, mcl used LLVM bitcode for some functions, and the target is more generic than the target selected by mcpu=native. As soon as LLVM detects this mismatch, it uses the provided target tuple instead of the target that LLVM defaults to (which is generic aarch64, since nothing more specific is set, I think), so this should be fine. If you are concerned about this warning, I can disable native optimization on arm and aarch64 Apple ABI, the gains at this point should be negligible anyway (I expect the largest gain in the optimized GMP).
Regarding the Could not find MCL_LIBRARY using the following names: mcl error: This seems to be the complaint of the Java FFI build script, it happens when mcl was not properly compiled and it cannot find and link the FFI against it. Also, I could change the script to use the Makefile-based build system instead of the CMake build system if you wanted, since the Makefiles seem to be more actively maintained. However, in some environments (especially Cygwin or older Solaris / BSD releases), CMake works somewhat reliably in my experience, while the Makefile-based system needs manual setup (this is what common.mk is for), which is why I used it originally.

feidens commented 2 years ago

Looks roughly the same except Building ASM object CMakeFiles/mcl.dir/src/asm/aarch64.s.o instead of Building ASM object CMakeFiles/mcl.dir/src/asm/arm.s.o.

HEAD ist jetzt bei 5833b40 Allow arm64 name for aarch64 ISA in CMakeLists.txt
----- Deleting currently installed version of mcl -----
----- Building mcl -----
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/cc
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
Using assembly for arm64 / aarch64 since CMAKE_SYSTEM_PROCESSOR is arm64
-- Found GMP: /opt/homebrew/Cellar/gmp/6.2.1_1/include (found version "6.2.1") 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/mcl/build
Scanning dependencies of target mcl
[  8%] Building CXX object CMakeFiles/mcl.dir/src/fp.cpp.o
[ 16%] Building ASM object CMakeFiles/mcl.dir/src/asm/aarch64.s.o
/tmp/mcl/src/asm/aarch64.s:5:2: error: unknown directive
 .type makeNIST_P192L,@function
 ^
/tmp/mcl/src/asm/aarch64.s:13:2: error: unknown directive
 .size makeNIST_P192L, .Lfunc_end0-makeNIST_P192L
 ^
/tmp/mcl/src/asm/aarch64.s:17:2: error: unknown directive
 .type mcl_fpDbl_mod_NIST_P192L,@function
WorldofJARcraft commented 2 years ago

On closer inspection, on Mac, the llvm-ir file base64.ll seems to be used directly. As long as clang is the compiler, it should process the IR like system-specific assembly and output a binary. I have added another commit (d2f583d4d73895fe4a4c7dba500cf9ba809f8d9b), using base64.ll accordingly.
If this fails, can you try the previous version, and add -DMCL_USE_LLVM=ON in line 93 of the install_fast script? This should bypass the assembly sources altogether and generate fresh LLVM-IR.

feidens commented 2 years ago

For d2f583d4d73895fe4a4c7dba500cf9ba809f8d9b

----- Building mcl -----
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/cc
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
Using assembly for arm since CMAKE_SYSTEM_PROCESSOR is arm64
-- Found GMP: /opt/homebrew/Cellar/gmp/6.2.1_1/include (found version "6.2.1") 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/mcl/build
Scanning dependencies of target mcl
[  8%] Building CXX object CMakeFiles/mcl.dir/src/fp.cpp.o
[ 16%] Building ASM object CMakeFiles/mcl.dir/src/asm/arm.s.o
/tmp/mcl/src/asm/arm.s:2:2: error: unknown directive
 .syntax unified

with your mcl commit before that

----- Building mcl -----
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/cc
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Found GMP: /opt/homebrew/Cellar/gmp/6.2.1_1/include (found version "6.2.1") 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/mcl/build
[  7%] Building CXX object CMakeFiles/gen.dir/src/gen.cpp.o
[ 14%] Linking CXX executable bin/gen
[ 14%] Built target gen
[ 21%] Generating base64.ll
llvmVer=0x70
[ 28%] Generating base64.o
warning: overriding the module target triple with arm64-apple-macosx12.0.0 [-Woverride-module]
1 warning generated.
[ 28%] Built target gen_base64.o
[ 35%] Building CXX object CMakeFiles/mcl.dir/src/fp.cpp.o
[ 42%] Linking CXX shared library lib/libmcl.dylib
[ 42%] Built target mcl
[ 50%] Building CXX object CMakeFiles/mcl_st.dir/src/fp.cpp.o
[ 57%] Linking CXX static library lib/libmcl.a
[ 57%] Built target mcl_st
[ 64%] Building CXX object CMakeFiles/mclbn256.dir/src/bn_c256.cpp.o
[ 71%] Linking CXX static library lib/libmclbn256.a
[ 71%] Built target mclbn256
[ 78%] Building CXX object CMakeFiles/mclbn384.dir/src/bn_c384.cpp.o
[ 85%] Linking CXX static library lib/libmclbn384.a
[ 85%] Built target mclbn384
[ 92%] Building CXX object CMakeFiles/mclbn384_256.dir/src/bn_c384_256.cpp.o
[100%] Linking CXX static library lib/libmclbn384_256.a
[100%] Built target mclbn384_256

but at the end after [100%] Linking CXX shared library libmcljava.dylib

/opt/homebrew/Cellar/cmake/3.22.2/bin/cmake -E cmake_link_script CMakeFiles/mcljava.dir/link.txt --verbose=1
/Library/Developer/CommandLineTools/usr/bin/c++ -mcpu=native -I /usr/local/include -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.1.sdk -dynamiclib -Wl,-headerpad_max_install_names -o libmcljava.dylib -install_name @rpath/libmcljava.dylib CMakeFiles/mcljava.dir/mcl_wrap.cxx.o  /opt/homebrew/lib/libgmp.dylib /tmp/mcl/build/lib/libmcl.a 
Undefined symbols for architecture arm64:
  "_mcl_fpDbl_add3L", referenced from:
      void mcl::fp::setOp<3ul>(mcl::fp::Op&, mcl::fp::Mode) in libmcl.a(fp.cpp.o)
      mcl::fp::DblAdd<3ul, mcl::fp::Ltag>::f in libmcl.a(fp.cpp.o)
  "_mcl_fpDbl_add4L", referenced from:
      void mcl::fp::setOp<4ul>(mcl::fp::Op&, mcl::fp::Mode) in libmcl.a(fp.cpp.o)
      mcl::fp::DblAdd<4ul, mcl::fp::Ltag>::f in libmcl.a(fp.cpp.o)
  "_mcl_fpDbl_add6L", referenced from:

and so on...

ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [libmcljava.dylib] Error 1
make[2]: *** [CMakeFiles/mcljava.dir/all] Error 2
make[1]: *** [CMakeFiles/mcljava.dir/rule] Error 2
make: *** [mcljava] Error 2
----- Copying mcl java shared library to /usr/lib/ -----
cp: libmcljava.dylib: No such file or directory
----- Installation finished successfully. Deleting mcl repository folder -----
----- Done -----
WorldofJARcraft commented 2 years ago

The functions it can't link are exactly the ones that should be present in the assembly source. Somehow, they are not generated by the gen binary. Can you try one last time with my latest mcl commit d668eb4c739bdd06942eae90f70c4c27605cd6bb and without -DMCL_USE_LLVM? If this does not work, I am afraid I am out of ideas. In this case, all I can offer is to build a non-optimized library on M1 macs (which is at least better than no library at all), and try to raise the issue with mcl.

feidens commented 2 years ago

Unfortunately it doesn't work.

----- Building mcl -----
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/cc
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
Using assembly for arm64 / aarch64 (Apple ABI) since CMAKE_SYSTEM_PROCESSOR is arm64
-- Found GMP: /opt/homebrew/Cellar/gmp/6.2.1_1/include (found version "6.2.1") 
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/mcl/build
[ 10%] Building CXX object CMakeFiles/mcl.dir/src/fp.cpp.o
[ 20%] Linking CXX shared library lib/libmcl.dylib
Undefined symbols for architecture arm64:
  "_mcl_fpDbl_add3L", referenced from:
      void mcl::fp::setOp<3ul>(mcl::fp::Op&, mcl::fp::Mode) in fp.cpp.o
      mcl::fp::DblAdd<3ul, mcl::fp::Ltag>::f in fp.cpp.o
  "_mcl_fpDbl_add4L", referenced from:

and so on. Then:

ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [lib/libmcl.1.22.dylib] Error 1
make[1]: *** [CMakeFiles/mcl.dir/all] Error 2
make: *** [all] Error 2
feidens commented 2 years ago

If I just build mcl as outlined by the author of mcl it works (also if I use our old script). Do I understand it correctly that you suggest that in these cases a non-optimized version is built and this is what you could offer now?

WorldofJARcraft commented 2 years ago

Just to be clear: the CMake-Build for mcl existed before I forked the project, and is a documented way of building the library too (search for "How to build with CMake" in mcl's top-level Readme). It was mostly used on Windows, because UNIX Makefiles do not work on Windows without a compatibility environment like Cygwin. However, it should work fine on other platforms as well (and in my testing, it worked on Linux and FreeBSD without any issue).
What my original PR to mcl added was a CMake-Build for the Java FFI, since there was no way of building mcljava.dll automatically on Windows (while mcl.dll could be using CMake).
The intention of this PR was to improve performance of mcljava by a) compiling the required libgmp from source, while letting it select the assembly sources that are optimal for the installed CPU, and b) using more aggressive tuning flags for the C / C++ compiler in mcl. I originally developed the script for benchmarking in my own project, saw a small gain and wanted to contribute it back.
I used CMake since it allows changing both the GMP library directory and the C / C++ compiler flags in a reliable, cross-platform way. The Makefiles do not allow this without string replacement in common.mk or overwriting the system-provided libgmp, both of which might break this or other programs on the system, which is why I did not attempt this.
I have changed the script to use of make if no second argument is provided, and CMake with the tuning flags and the custom GMP library if a second argument is provided. I hope this works on every platform, and it should not break existing scripts.
I have also disabled the performance tuning and added a warning specifically on Apple M1, so that dependent scripts that set the second options do not start failing on this platform as well.
If no second option is provided and the default flags are used, compiler and linker will a) rely on the system-provided libgmp, which has to support a wider range of CPUs and is not optimized to the installed CPU but should work reasonably well, and b) default architecture flags (which means that assembly generated will work on every CPU that supports the current instruction set, but features of newer CPUs like AVX, etc. will not be used).
How much the performance increases by using optimization is different from CPU to CPU. Since GMP mostly consists of hand-written assembler, including optimizations for specific ISA extensions like AVX, I assume that on some CPUs, the gain will be quite large (e.g., see the results on my 2010 Xeon from above).

feidens commented 2 years ago

I tested your new script vs our old one on my old 2012 MacBook. The results are similar to @JanBobolz one. So the old script results in better performance. Our current idea is that we could provide as a standard our old script and as an optional solution your new script for those that potentially want optimized performance on some machines. With a good explanation in the readme for "some machines" this could work.

JanBobolz commented 2 years ago

I've just tested the script in a fresh-ish Ubuntu VM on my Intel CPU.

Performance

Overall, performance is roughly: packaged < newInstallScriptWithGmpOption < newInstallScriptWithoutGmpOption < oldInstallScript. So for whatever reason, the old install script's mcl lib is the fastest (24ms vs 14ms vs 10ms vs 8ms for VfPolicy in issuer-hiding-cred) šŸ¤”.

Notes

When running without the gmp option

When running with the gmp option

I get

-- Build files have been written to: /tmp/mcl/ffi/java/build
Unknown argument -v
Usage: cmake --build <dir> [options] [-- [native-options]]

Removing the -v fixes this issue.

JanBobolz commented 2 years ago

So right now, it seems to be somewhat hard to figure out which build process is better for which hardware configuration. My suggestion would be that you (1) remove the m1 special cases from your script (make it a bit more readable again), (2) fix the issues I mention above, and then (3) we merge this install script as an alternative install script next to the old one, instructing users to try out which of the two scripts yields the faster library on their machine. Is that a sound plan?

My preference would also be that your new script becomes the "install with gmp" script, i.e. the additional gmp parameter only triggers download and installation of gmp, but then the build would otherwise always proceed in exactly the same way independent of the gmp parameter (it may assume that that gmp has been installed). Is that possible? I noticed during testing that right now, failing builds made me have to restart the script with the gmp option and hence reinstall gmp again and again, even though it's been compiled and installed several times already.

Anyway, thanks for all the work you have put into this already! šŸ™‚ Even though I cannot reproduce the performance gains on my machine, this seems to be quite the leap on your processor, so probably this will help many more people.

WorldofJARcraft commented 2 years ago

I have made the requested changes to the script, restored the old version of the script and added some documentation.
I have no idea what might have caused cp not to copy the library into /usr/lib at the end - the path is hard-coded as absolute path. This can only happen if the script exits prematurely. In this case, the exit code should not have been zero.
Also, I cannot explain why the resulting library when running without second option could be slower than the library created by the old script (short of run-to-run-variance) - the last revision did exactly the same, except that it might have used another commit of mcl (I copied the make commands from the old script; however, I needed to upgrade mcl to incorporate a fix). Are you sure the library was properly installed?

JanBobolz commented 2 years ago

I have no idea what might have caused cp not to copy the library into /usr/lib at the end - the path is hard-coded as absolute path. This can only happen if the script exits prematurely. In this case, the exit code should not have been zero.

For some reason, with those parameters, the library was not compiled into the /tmp/mcl/ffi/java/build dir (I think; this is from memory), but (seemingly) into the /tmp/mcl/lib dir. At some point, I'll test whether such an issue is still there in the new script iteration, but I'm guessing it was just a weird combination of parameters, somehow.

Also, I cannot explain why the resulting library when running without second option could be slower than the library created by the old script (short of run-to-run-variance)

The benchmark numbers were pretty stable.

the last revision did exactly the same, except that it might have used another commit of mcl (I copied the make commands from the old script; however, I needed to upgrade mcl to incorporate a fix).

The copied make command in that script only executed on ARM Macs, but I'm still on an Intel Mac, so I ran into the cmake case. A working theses (which I haven't really tested) is that make doesn't pick up on GMP at all on my machine and uses vint instead, which seems to be faster on newer Intel machines?! Who knows šŸ¤·.

Are you sure the library was properly installed?

Yes, I watched the /usr/lib dir live and saw the library removed and re-added when running the scripts.

JanBobolz commented 2 years ago

Perfect. Thank you for your contribution :)