Error while calling cudnnGetConvolutionForwardWorkspaceSize (CUDNN_STATUS_NOT_SUPPORTED) when using dlib dnn detector in multiple processes on the same machine #2215
I'm wondering if this is a cuDNN multi-process issue rather than a dlib issue, or it may be the way I'm creating and using the detector...
Expected Behavior
Thread safe face detections and descriptor extractions using cuDNN on CUDA with the cnn detector.
Current Behavior
I'm running a face detection and descriptor extraction program. The code of interest is as follows (full code at the bottom):
void extract_faces_cnn(std::vector<dlib::matrix<rgb_pixel>> &faces, std::vector<dlib::rectangle> &coords, dlib::cv_image<rgb_pixel> &img, shape_predictor &sp) {
matrix<rgb_pixel> imgmat = mat(img);
auto dets = cnn_detector(imgmat);
for (auto&& face : dets)
{
auto shape = sp(imgmat, face);
matrix<rgb_pixel> face_chip;
extract_image_chip(img, get_face_chip_details(shape,150,0.25), face_chip);
faces.push_back(move(face_chip));
coords.push_back(face.rect);
}
cnn_detector.clean(); //Seem to need this otherwise memory usage creeps up and up and gobbles all the GPU mems.
}
The code is called from an Apache Storm bolt written in Java (so it's part of a JNI wrapper). This all works fine so long as there's only once instance of the bolt. dlib works perfectly, and is nice and fast and we get lovely descriptors. Awesome.
If more than one instance of the bolt is created (each one is created and run in it's own jvm), we start getting this error:
2020-10-15 14:59:38.370 STDERR Thread-0 [INFO] terminate called after throwing an instance of 'dlib::cudnn_error'
2020-10-15 14:59:38.371 STDERR Thread-0 [INFO] what(): Error while calling cudnnGetConvolutionForwardWorkspaceSize( context(), descriptor(data), (const cudnnFilterDescriptor_t)filter_handle, (const cudnnConvolutionDescriptor_t)conv_handle, descriptor(dest_desc), (cudnnConvolutionFwdAlgo_t)forward_algo, &forward_workspace_size_in_bytes) in file /home/ubuntu/dlib/dlib/cuda/cudnn_dlibapi.cpp:1026. code: 9, reason: CUDNN_STATUS_NOT_SUPPORTED
Steps to Reproduce
On a g4dn.4xlarge instance install CUDA 11.1, cuDNN 8 (8_8.0.4.30-1+cuda11.1_amd64 and libcudnn8-dev_8.0.4.30-1+cuda11.1_amd64 deb packages to be exact)
-- The C compiler identification is GNU 8.4.0
-- The CXX compiler identification is GNU 8.4.0
-- Check for working C compiler: /usr/bin/gcc-8
-- Check for working C compiler: /usr/bin/gcc-8 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++-8
-- Check for working CXX compiler: /usr/bin/g++-8 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Using CMake version: 3.16.3
-- Compiling dlib version: 19.21.99
-- Enabling AVX instructions
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found X11: /usr/include
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so
-- Looking for XOpenDisplay in /usr/lib/x86_64-linux-gnu/libX11.so;/usr/lib/x86_64-linux-gnu/libXext.so - found
-- Looking for gethostbyname
-- Looking for gethostbyname - found
-- Looking for connect
-- Looking for connect - found
-- Looking for remove
-- Looking for remove - found
-- Looking for shmat
-- Looking for shmat - found
-- Looking for IceConnectionNumber in ICE
-- Looking for IceConnectionNumber in ICE - found
-- Found system copy of libpng: /usr/lib/x86_64-linux-gnu/libpng.so;/usr/lib/x86_64-linux-gnu/libz.so
-- Searching for BLAS and LAPACK
-- Searching for BLAS and LAPACK
-- Found PkgConfig: /usr/bin/pkg-config (found version \"0.29.1\")
-- Checking for module 'cblas'
-- No package 'cblas' found
-- Checking for module 'lapack'
-- Found lapack, version 0.3.8+ds
-- Looking for cblas_ddot
-- Looking for cblas_ddot - not found
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- Found OpenBLAS library
-- Looking for sgetrf_single
-- Looking for sgetrf_single - found
-- Using OpenBLAS's built in LAPACK
-- Looking for cblas_ddot
-- Looking for cblas_ddot - found
-- Looking for sgesv
-- Looking for sgesv - not found
-- Looking for sgesv_
-- Looking for sgesv_ - not found
-- Found CUDA: /usr/local/cuda (found suitable version \"11.1\", minimum required is \"7.5\")
-- Looking for cuDNN install...
-- Found cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
-- Building a CUDA test project to see if your compiler is compatible with CUDA...
-- Building a cuDNN test project to check if you have the right version of cuDNN installed...
-- Enabling CUDA support for dlib. DLIB WILL USE CUDA
-- C++11 activated.
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ubuntu/dlib/build
Foreword
I'm wondering if this is a cuDNN multi-process issue rather than a dlib issue, or it may be the way I'm creating and using the detector...
Expected Behavior
Thread safe face detections and descriptor extractions using cuDNN on CUDA with the cnn detector.
Current Behavior
I'm running a face detection and descriptor extraction program. The code of interest is as follows (full code at the bottom):
The code is called from an Apache Storm bolt written in Java (so it's part of a JNI wrapper). This all works fine so long as there's only once instance of the bolt. dlib works perfectly, and is nice and fast and we get lovely descriptors. Awesome. If more than one instance of the bolt is created (each one is created and run in it's own jvm), we start getting this error:
Steps to Reproduce
sudo apt-get install gfortran libopenblas-dev liblapack-dev gcc-8 g++-8
git clone https://github.com/davisking/dlib.git && cd dlib && mkdir build && cd build
cmake -DCMAKE_C_COMPILER=/usr/bin/gcc-8 -DCMAKE_CXX_COMPILER=/usr/bin/g++-8 -DUSE_AVX_INSTRUCTIONS=ON -DBUILD_SHARED_LIBS=1 -DCUDA_HOST_COMPILER=/usr/bin/gcc-8 ../
Output is:
cmake --build . --config Release
make install
cmake -DCMAKE_C_COMPILER=/usr/bin/gcc-8 -DCMAKE_CXX_COMPILER=/usr/bin/g++-8 -DUSE_AVX_INSTRUCTIONS=ON ../ && cmake --build . --config Release
sudo cp *.so /usr/local/lib/
git clone https://github.com/davisking/dlib-models.git && cp /home/ubuntu/dlib-models/*bz2 /opt/dlib/models/ && bzip2 -d /opt/dlib/models/*.bz2
Full C++ Code:
CMakeLists.txt used to build it: