OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.38k stars 1.5k forks source link

Openblas build fail on Skylake #3591

Closed hpcnpatel closed 2 years ago

hpcnpatel commented 2 years ago

I wanted to report a build error.

I am using Spack to build openblas (versions 0.3.20 to 0.3.17) and get build failures shown below,

The error pop up using Intel compilers (from 2018 version to 2022 version) on Skylake_avx512.

==> Installing openblas-0.3.20-jymnrwz5nbr2omxjeu36rsiitstkvs24
==> No binary for openblas-0.3.20-jymnrwz5nbr2omxjeu36rsiitstkvs24 found: installing from source
==> Using cached archive: ~/cache/_source-cache/archive/84/8495c9affc536253648e942908e88e097f2ec7753ede55aca52e5dead3029e3c.tar.gz
==> No patches needed for openblas
==> openblas: Executing phase: 'edit'
==> openblas: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j24' 'CC=~/spack/lib/spack/env/intel/icc' 'FC=~/spack/lib/spack/env/intel/ifort' 'MAKE_NB_JOBS=0' 'ARCH=x86_64' 'TARGET=SKYLAKEX' 'USE_LOCKING=1' 'USE_OPENMP=0' 'USE_THREAD=0' 'RANLIB=ranlib' 'libs' 'netlib' 'shared'

4 errors found in build log:
     2925    icc: command line warning #10121: overriding '-march=skylake-avx512' with '-march=skylake-avx512'
     2926    icc: command line warning #10121: overriding '-march=skylake-avx512' with '-march=skylake-avx512'
     2927    ~/spack/lib/spack/env/intel/icc -O2 -DSMALL_MATRIX_OPT -DMAX_STACK_ALLOC=2048 -DUSE_LOCKING -wd981 -DF_INTERFACE_INTEL -
             fPIC -DNO_WARMUP -DMAX_CPU_NUMBER=40 -DMAX_PARALLEL_NUMBER=1 -DBUILD_SINGLE=1 -DBUILD_DOUBLE=1 -DBUILD_COMPLEX=1 -DBUILD_COMPLEX16=1 -DVERSION=\"0.3.20\" -msse3 -mssse3 -m
             sse4.1 -mavx -mavx2 -march=skylake-avx512 -UASMNAME -UASMFNAME -UNAME -UCNAME -UCHAR_NAME -UCHAR_CNAME -DASMNAME=cgemm_kernel_r -DASMFNAME=cgemm_kernel_r_ -DNAME=cgemm_ker
             nel_r_ -DCNAME=cgemm_kernel_r -DCHAR_NAME=\"cgemm_kernel_r_\" -DCHAR_CNAME=\"cgemm_kernel_r\" -DNO_AFFINITY -I.. -UDOUBLE  -DCOMPLEX -c -UDOUBLE -DCOMPLEX -DNC ../kernel/x
             86_64/cgemm_kernel_8x2_skylakex.c -o cgemm_kernel_r.o
     2928    icc: command line warning #10121: overriding '-march=skylake-avx512' with '-march=skylake-avx512'
     2929    icc: command line warning #10121: overriding '-march=skylake-avx512' with '-march=skylake-avx512'
     2930    icc: command line warning #10121: overriding '-march=skylake-avx512' with '-march=skylake-avx512'
  >> 2931    ../kernel/x86_64/dgemm_small_kernel_nn_skylakex.c(100) (col. 1): internal error: 04010003_1159
     2932    
     2933    compilation aborted for ../kernel/x86_64/dgemm_small_kernel_nn_skylakex.c (code 4)
  >> 2934    make[1]: *** [Makefile.L3:4658: dgemm_small_kernel_nn.o] Error 4
     2935    make[1]: *** Waiting for unfinished jobs....
  >> 2936    ../kernel/x86_64/dgemm_small_kernel_nn_skylakex.c(100) (col. 1): internal error: 04010003_1159
     2937    
     2938    compilation aborted for ../kernel/x86_64/dgemm_small_kernel_nn_skylakex.c (code 4)
  >> 2939    make[1]: *** [Makefile.L3:4686: dgemm_small_kernel_b0_nn.o] Error 4
     2940    make[1]: Leaving directory '~/tmp/skylake_avx512/spack-stage-openblas-0.3.20-jymnrwz5nbr2omxjeu36rsiitstkvs24/spack-src/kernel'
     2941    make: *** [Makefile:170: libs] Error 1

Note: I am installing the openblas on the Haswell machine too and do not get the error I posted for Skylake.

I am not sure if this error is originating as a result of using Intel compiler to build openblas on Skylake architecture or Spack related. Any help or tip is appreciated.

martin-frbg commented 2 years ago

This appears to be an internal error in the Intel compiler - if this is reproducible (i.e. not a fluke due to bad RAM etc) and you do not want to use a different compiler the easiest workaround is probably to disable the SMALL_MATRIX_OPT on line 263 of Makefile.system. (It should probably be possible to override it on the commandline, but nobody expected this to be an issue).

martin-frbg commented 2 years ago

@guowangy should this be reported to the ICC team, or can you forward it internally ? (Neither me nor gcc see anything wrong with your code from #3335)

brada4 commented 2 years ago

You might want to bisect issue more accurately, and submit workaround to spack package

martin-frbg commented 2 years ago

@brada4 exactly what would you want to bisect here ?

brada4 commented 2 years ago

i want OP to report/fix this in spack too.

hpcnpatel commented 2 years ago

This appears to be an internal error in the Intel compiler - if this is reproducible (i.e. not a fluke due to bad RAM etc) and you do not want to use a different compiler the easiest workaround is probably to disable the SMALL_MATRIX_OPT on line 263 of Makefile.system. (It should probably be possible to override it on the commandline, but nobody expected this to be an issue).

Yes, this is reproducible. We build OpenBLAS with Intel and GCC both, as we cater to the needs of users on Supercomputers and clusters.

Thanks for the workaround, I will try locally.

martin-frbg commented 2 years ago

This was actually fixed already by #3550 shortly after 0.3.20 was released (though I think this was primarily done for LLVM). The culprit responsible for the internal compiler error appears to have been the spurious asm("k1") in the declaration of the mask variable on line 269