giaf / blasfeo

Basic linear algebra subroutines for embedded optimization
Other
321 stars 88 forks source link

Error with avx2 when compiling for Intel Haswell architecture #4

Closed ggleizer closed 7 years ago

ggleizer commented 7 years ago

Hi,

I tried compiling BLASFEO with Intel Haswell as target architecture but it threw an error concerning avx2. Here is the log. It works under other Intel architectures but my processor is in the Haswell family.

Thank you very much!

[ggleizer@localhost blasfeo]$ make rm -f libblasfeo.a make -C auxiliary clean make[1]: Entering directory/ggleizer/hpmpc/blasfeo/auxiliary' rm -f .o make -C avx2 clean make[2]: Entering directory `/ggleizer/hpmpc/blasfeo/auxiliary/avx2' rm -f .o make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary/avx2' make -C avx clean make[2]: Entering directory/ggleizer/hpmpc/blasfeo/auxiliary/avx' rm -f .o make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary/avx' make -C c99 clean make[2]: Entering directory/ggleizer/hpmpc/blasfeo/auxiliary/c99' rm -f .o make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/auxiliary/c99' make[1]: Leaving directory/ggleizer/hpmpc/blasfeo/auxiliary' make -C kernel clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/kernel' make -C avx2 clean make[2]: Entering directory/ggleizer/hpmpc/blasfeo/kernel/avx2' rm -f .o rm -f .s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/avx2' make -C avx clean make[2]: Entering directory/ggleizer/hpmpc/blasfeo/kernel/avx' rm -f .o rm -f .s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/avx' make -C sse3 clean make[2]: Entering directory/ggleizer/hpmpc/blasfeo/kernel/sse3' rm -f .o rm -f .s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/sse3' make -C fma clean make[2]: Entering directory/ggleizer/hpmpc/blasfeo/kernel/fma' rm -f .o rm -f .s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/fma' make -C c99 clean make[2]: Entering directory/ggleizer/hpmpc/blasfeo/kernel/c99' rm -f .o rm -f .s make[2]: Leaving directory /ggleizer/hpmpc/blasfeo/kernel/c99' make[1]: Leaving directory/ggleizer/hpmpc/blasfeo/kernel' make -C blas clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/blas' rm -f *.o rm -f *.s make[1]: Leaving directory/ggleizer/hpmpc/blasfeo/blas' make -C test_problems clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/test_problems' rm -f *.o rm -f test.out rm -f libblasfeo.a make[1]: Leaving directory/ggleizer/hpmpc/blasfeo/test_problems' make -C examples clean make[1]: Entering directory /ggleizer/hpmpc/blasfeo/examples' rm -f *.o rm -f test.out rm -f libblasfeo.a make[1]: Leaving directory/ggleizer/hpmpc/blasfeo/examples' touch ./include/blasfeo_target.h echo "#ifndef TARGET_X64_INTEL_HASWELL" > ./include/blasfeo_target.h echo "#define TARGET_X64_INTEL_HASWELL" >> ./include/blasfeo_target.h echo "#endif" >> ./include/blasfeo_target.h echo "#ifndef LA_HIGH_PERFORMANCE" >> ./include/blasfeo_target.h echo "#define LA_HIGH_PERFORMANCE" >> ./include/blasfeo_target.h echo "#endif" >> ./include/blasfeo_target.h ( cd auxiliary; make obj) make[1]: Entering directory /ggleizer/hpmpc/blasfeo/auxiliary' gcc -O2 -fPIC -m64 -mavx2 -mfma -DTARGET_X64_INTEL_HASWELL -DLA_HIGH_PERFORMANCE -DOS_LINUX -DREF_BLAS_OPENBLAS -I/opt/openblas/include -c -o d_aux_lib4.o d_aux_lib4.c cc1: error: unrecognized command line option "-mavx2" make[1]: *** [d_aux_lib4.o] Error 1 make[1]: Leaving directory/ggleizer/hpmpc/blasfeo/auxiliary' make: *** [static_library] Error 2 `

giaf commented 7 years ago

what is your complier?

ggleizer commented 7 years ago

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

giaf commented 7 years ago

Apparently you need at least gcc 4.7 for that.

You can try to replace -mavx2 with -mavx here https://github.com/giaf/blasfeo/blob/master/Makefile.rule#L121 (and possibly remove -mfma if it complains about), and see if you can still use the assembly kernels optimized for Haswell.

Otherwise, either you use a more recent compiler, or a older target (losing some performance).

Let me know how it goes :)

ggleizer commented 7 years ago

Thank you, sorry for my ignorance :/ - I'd rather update gcc. Updates soon

ggleizer commented 7 years ago

Looks like I need a newer Red Hat distribution to use gcc 4.8; While I'm downloading the newer version, I tried replacing -mavx2 to -mavx and it didn't work. Also, tried removing -mfma and same error (assembler errors, it seems). Curiously, if I use INTEL_CORE it works. Could it be an error in architecture selection?

giaf commented 7 years ago

By Intel Core architecture (not the Core brand name for Core i7, i5, ...), I mean this https://en.wikipedia.org/wiki/Intel_Core_(microarchitecture) that is a rather old architecture. In BLASFEO, the code for this target uses instructions up to SSE4 https://en.wikipedia.org/wiki/SSE4 that in double precision can perform 1 multiplication and 1 addition of 2-wide vectors per clock cycle.

Intel Sandy Bridges introduces the AVX instruction set https://en.wikipedia.org/wiki/Advanced_Vector_Extensions that in double precision can perform 1 multiplication and 1 addition of 4-wide vectors per clock cycle.

Intel Haswell introduces the AVX2 and FMA3 instruction sets https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Fused_multiply.E2.80.93add that in double precision can perform 2 fused-multuply-add of 4-wide vectors per clock cycle.

Each new computer architecture supports the old instructions, plus possibly additions. So, using an older target (as e.g. Core or Sandy Bridge) works for you, but you are not fully exploiting the new instruction sets.

Old compilers are not aware of recent instruction sets, so even if you have an Haswell processor, the gcc 4.4.7 compiler can not generate code with AVX2 and FMA instructions.

In any case, even with an old linux distro you can always install more recent versions of gcc from source.

ggleizer commented 7 years ago

Thank you. Yeah, so in my case it is Haswell indeed.

I'm a very poor Linux user, so I keep trying to figure things out from forums. I'll give it one more try on installing newer versions from source as you suggested. Sorry to keep giving you trouble.

giaf commented 7 years ago

No problem, I'm glad to help.

Otherwise you can simply use the Sandy Bridge target, it's not the best one, but the speed up using the Haswell target is always less than 2x.

ggleizer commented 7 years ago

Sandy Bridge works fine too. I'll work on the newer gcc and report back if I'm successful with Haswell then.

ggleizer commented 7 years ago

GCC updated to 4.8 and BLASFEO now compiles with Haswell target!

Thank you so much!

RoyiAvital commented 4 years ago

@giaf , I think the ISA tests aren't good enough. They only check if the processor is capable of the ISA (Which you could do with cpu_id()). You should also check if the correct headers are set. So at least add an ISA command like: _mm256_loadu_pd or _mm_loadu_pd. It is better to selects commands for AVX2 as well (Something with integer).

imciner2 commented 4 years ago

They only check if the processor is capable of the ISA (Which you could do with cpu_id())

This is provided as a runtime function blasfeo_processor_cpu_features the user of the library can call to see if the current processor supports the compiled version.

So at least add an ISA command like: _mm256_loadu_pd or _mm_loadu_pd.

The ISA tests directly compile the assembly mnemonics to test for support of the requested architecture, since the BLASFEO source uses assembly for the target-specific kernels instead of intrinsics. I don't think it uses any intrinsics in the code, so it shouldn't need to include special headers, but @giaf would know better than I about that.

RoyiAvital commented 4 years ago

That's not the case. For instance, try to compile the project on Skylake without the flag -mavx2. You will get an error.

The current CMAKELists.txt adds the flags but it is better the tests will test the correct flags are indeed added and effective.

imciner2 commented 4 years ago

Hmm, there are some AVX intrinsics in the code apparently. I had thought everything was done in pure assembly. It should be easy for me to add those to the ISA tests though.

giaf commented 4 years ago

Yes some easy stuff like BLAS 1 routines is vectorized with intrinsics instead of pure assembly.

@imciner2 great if you can do that! BTW your ISA tests turned out to be a great feature to use BLASFEO in acados :)

giaf commented 4 years ago

BTW I noticed that on some ARM architectures you still need to enable NEON with an assembler flag otherwise the assembler would complain. But in this case no headers are needed.

imciner2 commented 4 years ago

great if you can do that!

Done in PR https://github.com/giaf/blasfeo/pull/122 for the X86 intrinsics (which are the only ones used in the code currently).

BTW your ISA tests turned out to be a great feature to use BLASFEO in acados :)

Great to hear. That was the main reason I developed them, since I remember seeing the issues just setting a high default caused for some people's computers when installing it.

BTW I noticed that on some ARM architectures you still need to enable NEON with an assembler flag otherwise the assembler would complain. But in this case no headers are needed.

Yea, some ARM architectures get really picky about the flags needed to compile unfortunately.