RuntimeError: Error while calling cudaOccupancyMaxPotentialBlockSize() reason: invalid device function

Professor-Paradox commented 3 years ago

face_recognition version:1.3.0
Python version:3.8.4
Operating System:ubuntu 20.04
DLIB version:19.21.0
Description

testing the face recognition library

What I Did

import face_recognition as FR

path="c.jpg"
image = FR.load_image_file(path)
face_encodings = FR.face_encodings(image)[0]
print(face_encodings)

Installed cuda 11 and cudnn 11 with libraries too,
nvidia 450 on gt 730 gpu with nvidia architecture 3.5

The mnistcnn test returns true and passed but the python file returns the above error,tried same with other pictures too,couldn't get the encodings

IMPORTANT: If your issue is related to a specific picture, include it so others can reproduce the issue.

Error raised

Traceback (most recent call last):
  File "/home/t/Desktop/machinelearning/test.py", line 6, in <module>
    face_encodings = FR.face_encodings(image)[0]
  File "/home/t/Desktop/machinelearning/env/lib/python3.8/site-packages/face_recognition/api.py", line 214, in face_encodings
    return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
  File "/home/t/Desktop/machinelearning/env/lib/python3.8/site-packages/face_recognition/api.py", line 214, in <listcomp>
    return [np.array(face_encoder.compute_face_descriptor(face_image, raw_landmark_set, num_jitters)) for raw_landmark_set in raw_landmarks]
RuntimeError: Error while calling cudaOccupancyMaxPotentialBlockSize(&num_blocks,&num_threads,K) in file /tmp/pip-install-vy5ihom1/dlib/dlib/cuda/cuda_utils.h:186. code: 98, reason: invalid device function

tried many blogs on reinstalling the cuda and cudnn but they are working fine and all the test i could find are returning pass,don't know what is wrong with the python script.

Professor-Paradox commented 3 years ago

Setting up cuda for ubuntu,python development

The syntax and commands of the installation may vary overtime(they definitely will change) but the process of installation is similar.

Explanation

Cuda is a library built for nvidia gpus
Cudnn is a neural network package that utilizes cuda for neural network training.(needs cuda to be installed)

Packages such as tensorflow,dlib,face-recognition,opencv,pytorch are built to use cpu for their computation
But when these packages find cuda enabled environment they will use the gpu for the computation which will reduce the training time about 50%.

Information Gathering

collect gpu details
Run lspci | grep -i nvidia in a terminal,which will return the gpu name and its chipset

Example:
```
lspci | grep -i nvidia
#01:00.0 VGA compatible controller: NVIDIA Corporation GK208B [GeForce GT 730] (rev a1)
```
Here GK208B is the chipset of GT 730 graphic card
Go to nvidia and search for your gpu name,check the compute compatibility score, the value is the sm_version number.Make note of this.

Mine is 3.5 so my sm version is sm_35

Go to wikipedia and search for your gpu name and get the microarchitecture of your gpu.
The most common micro architectures of nvidia-gpus are
- Fermi
- Kepler(GT 700 series)
- Maxwell(GT 900 series)
- Pascal(GT 1000 series)
- Turing(GT 2000 series)
collecting cuda packages Go to this website and check the cuda version compatible with your gpu and sm_version.

each cuda version supports only few sm_xx versions.
for example cuda 8 only supports till sm_20,cuda 10 only supports till sm_37.

So errors( like cudaOccupancyMaxPotentialBlockSize() or invalid device function) may rise if we install cuda 11 for sm_30 architecture gpu(mine),since the cuda couldn't communicate with our gpu.

once you know which cuda version is supported for your gpu microarchitecture, go to nvidia cuda website and select needed cuda version and follow the instructions.
This will install cuda package.The ubuntu version doesn't matter,I installed cuda 10 for ubuntu 18 in ubuntu 20 without any errors

create a nvidia developer account and download the cudnn files,cudnn runtime,cudnn developer,cudnn code samples(for testing cudnn installation) compatabile with your cuda version from here

Installation first install the nvidia driver for cuda, not the latest but the driver comptabile with cuda.
For example "cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb" is my cuda downloaded file,notice the 440.33 this is nvidia driver version needed for cuda 10.2.

So Run sudo apt-get install nvidia-driver-440
this will install the latest 440 available in repos,currently its 440.100,works fine with cuda installation.
Reboot the pc and verify the installation of driver with nvidia-smi this will show the driver version and cuda version.
Run the following to install cuda, this is the exact copy of cuda links that you got from above link

# add the nvidia driver to sources list
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
sleep 5 && sudo apt-get update
# download the cuda driver deb file about 2GB,all these links are available in nvidia developer cuda page
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo apt-get install cuda

# if cuda not found error is raised,that means deb packages location is not added,add this in software repositories,change version number to above one
#deb file:///var/cuda-repo-10-2-local-10.2.89-440.33.01 /
# or copy this to /etc/apt/sources.list file and run sudo apt update and sudo apt-get install cuda

# verify installed cuda versions
dpkg -l | grep cuda-toolkit

run following to install cudnn

# change to directory with downloaded cudnn file(about 1GB)
cd directoryWithDownloadedFile

# extract files from cudnn archive,change the name
tar -xzvf cudnn-x.x-linux-x64-v8.x.x.x.tgz      

# verify cuda is discoverable and cuda
whereis cuda
nvcc -V
# installing cudnn is just copying the pre-compiled files to cuda directory
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include        
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64       
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*   

# add environment variables to make cudnn files discoverable
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc 
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/include:$LD_LIBRARY_PATH' >> ~/.bashrc 
source ~/.bashrc 

# install the downloaded cudnn runtime,development,samples deb files,change the names if needed
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.2_amd64.deb libcudnn7-dev_7.6.5.32-1+cuda10.2_amd64.deb   libcudnn7-doc_7.6.5.32-1+cuda10.2_amd64.deb

reboot
verify cudnn by running the mnistcudnn test

# copy the sample files to current directory
cp -r /usr/src/cudnn_samples_v7/ ./

# go to mnistcudnn directory 
cd cudnn_samples_v7/mnistCUDNN

# since cuda 10 works only with gcc-8 we need to install it and create a symlink to these in cuda bin directory
sudo apt-get install gcc-8 g++-8 
sudo ln -s /usr/bin/gcc-8 /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-8 /usr/local/cuda/bin/g++

# run the make and test 
make
./mnistCUDNN
# This will output passed which means every thing is working

Working with python

Create a python virtual environment and install dlib/pytorch/tensorflow any package you are using for machine learning, if already installed uninstall it and clear the pip files in ~/.cache and reinstall again.

rm -rf ~/.cache/pip/

The reinstall of packages takes time depending on the cpu and number of cores your have.
after the installation is done run your project and check if previous errors continue to exist,my problem was solved after this.

don't update nvidia driver,cuda version,cudnn files,this might break the system.

Dont' install nvidia-cuda-toolkit package in ubuntu repos,this is the latest cuda version which will not work every gpu.

arivudainambik commented 2 years ago

@Professor-Paradox now trying to install 440 it is installing 470 and cuda 10 is not installing with 470

could you please help ?

Professor-Paradox commented 2 years ago

hi, I haven't used Ubuntu in 2 years, but i remember clearly the issue for me was the specific driver for the specific GPU,

my GPU is 730 and is not receiving any more driver upgrades from Nvidia. so check if your GPU is supported and driver upgrades are there for you.

coming to the 470 version, Cuda 10 doesn't support that in my opinion https://docs.nvidia.com/deploy/cuda-compatibility/index.html

follow that link, and my theory is when you install any Nvidia driver they try to update to the latest version in the ubuntu repo. so better use the official Nvidia repo and download the specific version. if any doubts contact me again, but I doubt I will be of much help.

i tried this face recognition topic out of curiosity, not actual development, but I am willing to learn.

ageitgey / face_recognition