clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

Precompile kernels - cannot allocate memory #208

Closed mpekalski closed 8 years ago

mpekalski commented 8 years ago

I tried to precompile all the kernels, but I run out of memory (g++ and gcc both v5.2.1).

[ 24%] Building CXX object library/CMakeFiles/clBLAS.dir/__/include/AutoGemmIncludes/AutoGemmKernelBinaries.cpp.o
virtual memory exhausted: Cannot allocate memory
library/CMakeFiles/clBLAS.dir/build.make:3813: recipe for target 'library/CMakeFiles/clBLAS.dir/__/include/AutoGemmIncludes/AutoGemmKernelBinaries.cpp.o' failed
make[2]: *** [library/CMakeFiles/clBLAS.dir/__/include/AutoGemmIncludes/AutoGemmKernelBinaries.cpp.o] Error 1
CMakeFiles/Makefile2:226: recipe for target 'library/CMakeFiles/clBLAS.dir/all' failed
make[1]: *** [library/CMakeFiles/clBLAS.dir/all] Error 2
Makefile:136: recipe for target 'all' failed
make: *** [all] Error 2

It is kind of funny because I have 32GB of RAM, and there were still 11GB free a second before it crashed

free -m -s1

             total       used       free     shared    buffers     cached
Mem:         32114      20789      11324         27         93       2132
-/+ buffers/cache:      18562      13551
Swap:         2407          0       2407

and ulimit -v

unlimited

and the CMake I used

cmake ../src -DOPENCL_VERSION:STRING=2.0 -DACML_INCLUDE_DIRS:PATH=/opt/acml5.3.1/gfortran64_mp/include -DACML_LIBRARIES:FILEPATH=/opt/acml5.3.1/gfortran64_mp/lib/libacml_mp.so -DBLAS_DEBUG_TOOLS=ON -DOPENCL_OFFLINE_BUILD_HAWAII_KERNEL=ON -DBUILD_PERFORMANCE=ON -DCMAKE_INSTALL_PREFIX=/opt/clBLAS -DBUILD_SHARED_LIBS=ON -DUSE_SYSTEM_GTEST=ON -DOPENCL_INCLUDE_DIRS=/opt/opencl-headers/include -DPRECOMPILE_GEMM_PRECISION_DGEMM=ON -DPRECOMPILE_GEMM_PRECISION_ZGEMM=ON  -DPRECOMPILE_GEMM_PRECISION_SGEMM=ON -DPRECOMPILE_GEMM_PRECISION_CGEMM=ON -DPRECOMPILE_GEMM_TRANS_NN=ON -DPRECOMPILE_GEMM_TRANS_NT=ON -DPRECOMPILE_GEMM_TRANS_NC=ON -DPRECOMPILE_GEMM_TRANS_TN=ON -DPRECOMPILE_GEMM_TRANS_TT=ON -DPRECOMPILE_GEMM_TRANS_TC=ON -DPRECOMPILE_GEMM_TRANS_CN=ON -DPRECOMPILE_GEMM_TRANS_CT=ON -DPRECOMPILE_GEMM_TRANS_CC=ON
TimmyLiu commented 8 years ago

Pre-compile all the kernels into the same binary is not desired. I have tried on Visual Studio and ran out of heap memory. GCC might set similar limits as well. I would recommend only pre-compiling the subroutines that you will be using.

mpekalski commented 8 years ago

I did not know what I would be using, so I thought about giving a try to compiling everything. But thanks for confirming that it is not something wrong with my system.

TimmyLiu commented 8 years ago

For those kernels that are not precompiled, they will be compiled at run time and cached. This comes with a performance cost (for the first call of its kind) of compiling the kernel. Of course a logger that logs what routines with parameters a particular user calls and precompile only those kernels of interest would be more elegant.

TimmyLiu commented 8 years ago

Hi can we close this issue?

mpekalski commented 8 years ago

Yes, please.