Closed seheracer closed 4 years ago
Hi! It looks like you are using a PowerPC architecture, but the code was written for x86. One problem is (search for "fatal error") that I include a x86 headerfile for special instructions. Other problems seem to be related to that using the lzcnt operation (see the -mlzcnt error). I don't have a PowerPC available, so I can't fix that issue on my own, but I hope these two hints help you. You could either try to find a powerpc equivalent of these instructions or simple count the leasing zeros manually, should not have a significant performance impact anyways
Thanks @dabeschte for the information. I don't have time to fix these errors for PowerPC.
Most commonly used supercomputers in the US national labs have PowerPC processors. Not running on those machines may hurt the popularity of your code, just telling...
@seheracer should be fixed now
@dabeschte
I ran into the following build error on Summit (Volta V100): https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/. I'd appreciate if I can get help to resolve this issue. Below, I listed the steps I performed.
1) Loaded the following modules: cuda/10.1.105 gcc/7.4.0 cmake/3.17.3
2) git clone git@github.com:GPUPeople/spECK.git cd spECK
3) Changed the following line in include/Multiply.h static constexpr int spECK_DYNAMIC_MEM_PER_BLOCK{49152}; to static constexpr int spECK_DYNAMIC_MEM_PER_BLOCK{98304};
4) Didn't change the compute capability (which is already CC70 for V100).
5) bash-4.2$ chmod 700 linuxsetup.sh bash-4.2$ ./linuxsetup.sh Setup Speck with Compute Capability CC70 -- The CXX compiler identification is GNU 7.4.0 -- The CUDA compiler identification is NVIDIA 10.1.105 -- Check for working CXX compiler: /sw/summit/gcc/7.4.0/bin/c++ -- Check for working CXX compiler: /sw/summit/gcc/7.4.0/bin/c++ - works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working CUDA compiler: /sw/summit/cuda/10.1.105/bin/nvcc -- Check for working CUDA compiler: /sw/summit/cuda/10.1.105/bin/nvcc - works -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Build type not specified: Use Release by default -- Configuring done -- Generating done -- Build files have been written to: /gpfs/alpine/csc318/scratch/sacer/spECK/build Scanning dependencies of target spECKLib [ 13%] Building CXX object CMakeFiles/spECKLib.dir/source/GPU/memory.cpp.o [ 20%] Building CUDA object CMakeFiles/spECKLib.dir/source/GPU/Transpose.cu.o [ 26%] Building CUDA object CMakeFiles/spECKLib.dir/source/GPU/Compare.cu.o [ 33%] Building CXX object CMakeFiles/spECKLib.dir/source/COO.cpp.o [ 40%] Building CXX object CMakeFiles/spECKLib.dir/source/CSR.cpp.o [ 40%] Building CXX object CMakeFiles/spECKLib.dir/source/dCSR.cpp.o [ 46%] Building CXX object CMakeFiles/spECKLib.dir/source/Config.cpp.o c++: error: unrecognized command line option '-mlzcnt' c++: error: unrecognized command line option '-mlzcnt' c++: error: unrecognized command line option '-mlzcnt' make[2]: [CMakeFiles/spECKLib.dir/source/dCSR.cpp.o] Error 1 c++: error: unrecognized command line option '-mlzcnt' make[2]: Waiting for unfinished jobs.... make[2]: [CMakeFiles/spECKLib.dir/source/GPU/memory.cpp.o] Error 1 make[2]: [CMakeFiles/spECKLib.dir/source/CSR.cpp.o] Error 1 make[2]: [CMakeFiles/spECKLib.dir/source/COO.cpp.o] Error 1 [ 53%] Building CUDA object CMakeFiles/spECKLib.dir/source/GPU/Multiply.cu.o c++: error: unrecognized command line option '-mlzcnt' make[2]: [CMakeFiles/spECKLib.dir/source/Config.cpp.o] Error 1 /gpfs/alpine/csc318/scratch/sacer/spECK/source/GPU/Multiply.cu:10:10: fatal error: x86intrin.h: No such file or directory
include
compilation terminated. make[2]: [CMakeFiles/spECKLib.dir/source/GPU/Multiply.cu.o] Error 1 ptxas info : 215 bytes gmem ptxas info : Compiling entry function '_Z9d_compareIdEviiPKjS1_PKT_S1_S1_S4_bdPj' for 'sm_70' ptxas info : Function properties for _Z9d_compareIdEviiPKjS1_PKT_S1_S1_S4_bdPj 40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 432 bytes cmem[0], 8 bytes cmem[2] ptxas info : Compiling entry function '_Z9d_compareIfEviiPKjS1_PKT_S1_S1_S4_bdPj' for 'sm_70' ptxas info : Function properties for _Z9d_compareIfEviiPKjS1_PKT_S1_S1_S4_bdPj 40 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 432 bytes cmem[0], 8 bytes cmem[2] ptxas info : 14 bytes gmem ptxas info : Compiling entry function '_ZN6thrust8cuda_cub4core13_kernel_agentINS0_14parallel_for16ParallelForAgentINS0_11transform17unary_transform_fINS_10device_ptrIjEEPjNS5_14no_stencil_tagENS_8identityIjEENS5_21always_true_predicateEEElEESE_lEEvT0T1' for 'sm_70' ptxas info : Function properties for _ZN6thrust8cuda_cub4core13_kernel_agentINS0_14parallel_for16ParallelForAgentINS0_11transform17unary_transform_fINS_10device_ptrIjEEPjNS5_14no_stencil_tagENS_8identityIjEENS5_21always_true_predicateEEElEESE_lEEvT0T1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_ZN6thrust8cuda_cub4core13_kernel_agentINS0_14parallel_for16ParallelForAgentINS0_11transform17unary_transform_fIPKjNS_10device_ptrIjEENS5_14no_stencil_tagENS_8identityIjEENS5_21always_true_predicateEEElEESF_lEEvT0T1' for 'sm_70' ptxas info : Function properties for _ZN6thrust8cuda_cub4core13_kernel_agentINS0_14parallel_for16ParallelForAgentINS0_11transform17unary_transform_fIPKjNS_10device_ptrIjEENS5_14no_stencil_tagENS_8identityIjEENS5_21always_true_predicateEEElEESF_lEEvT0T1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_ZN6thrust8cuda_cub4core13_kernel_agentINS0_6scan9ScanAgentINS_10device_ptrIjEES6_NS_4plusIjEEijNS_6detail17integral_constantIbLb0EEEEES6_S6_S8_iNS0_3cub13ScanTileStateIjLb1EEENS3_22AddInitToExclusiveScanIjS8_EEEEvT0_T1_T2_T3_T4T5' for 'sm_70' ptxas info : Function properties for _ZN6thrust8cuda_cub4core13_kernel_agentINS0_6scan9ScanAgentINS_10device_ptrIjEES6_NS_4plusIjEEijNS_6detail17integral_constantIbLb0EEEEES6_S6_S8_iNS0_3cub13ScanTileStateIjLb1EEENS3_22AddInitToExclusiveScanIjS8_EEEEvT0_T1_T2_T3_T4T5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 54 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_ZN6thrust8cuda_cub4core13_kernel_agentINS0_6scan9InitAgentINS0_3cub13ScanTileStateIjLb1EEEiEES7_iEEvT0T1' for 'sm_70' ptxas info : Function properties for _ZN6thrust8cuda_cub4core13_kernel_agentINS0_6scan9InitAgentINS0_3cub13ScanTileStateIjLb1EEEiEES7_iEEvT0T1 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 10 registers, 364 bytes cmem[0] ptxas info : Compiling entry function '_ZN6thrust8cuda_cub3cub11EmptyKernelIvEEvv' for 'sm_70' ptxas info : Function properties for _ZN6thrust8cuda_cub3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : Compiling entry function '_Z16d_writeTransposeIdEviiPKjS1_PKT_PjS5_PS2_S5S5' for 'sm_70' ptxas info : Function properties for _Z16d_writeTransposeIdEviiPKjS1_PKT_PjS5_PS2_S5S5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z14d_findPositionIdEviiPKjS1_PKT_PjS5_PS2_S5S5' for 'sm_70' ptxas info : Function properties for _Z14d_findPositionIdEviiPKjS1_PKT_PjS5_PS2_S5S5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z16d_writeTransposeIfEviiPKjS1_PKT_PjS5_PS2_S5S5' for 'sm_70' ptxas info : Function properties for _Z16d_writeTransposeIfEviiPKjS1_PKT_PjS5_PS2_S5S5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z14d_findPositionIfEviiPKjS1_PKT_PjS5_PS2_S5S5' for 'sm_70' ptxas info : Function properties for _Z14d_findPositionIfEviiPKjS1_PKT_PjS5_PS2_S5S5 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z31d_calulateTransposeDistributioniiPKjS0_Pj' for 'sm_70' ptxas info : Function properties for _Z31d_calulateTransposeDistributioniiPKjS0_Pj 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] make[1]: [CMakeFiles/spECKLib.dir/all] Error 2 make: *** [all] Error 2