DeepLabCut / DeepLabCut-core

Headless DeepLabCut (no GUI support)
http://deeplabcut.org
GNU Lesser General Public License v3.0
30 stars 17 forks source link

Compatibility with RTX 3080 #13

Open PlatinumYao opened 3 years ago

PlatinumYao commented 3 years ago

OS: Win 10 DeepLabCut Version: DeepLabCut-core tf 2.2 alpha Anaconda env used: DLC-GPU (clone the DLC-GPU env and uninstall the CUDA and cudnn) Tensorflow Version: TF2.3, TF2.4, or tf-nightly, installed with pip (see below) Cuda version: 11.0 and 11.1 (see below)

Hi everyone, First of all, I want to say thank you to the deeplabcut team! I have been using the DLC for whisker tracking on an RTX 2060 for a while and it significantly facilitates my project. Recently, I got an RTX 3080 in the lab. However, I had a hard time setting it up for DLC due to the compatibility issue. First, I noticed that RTX 3000 series does not support CUDA 10.x or earlier versions, so I installed CUDA 11.0 or CUDA 11.1 with the coresponding CuDNN on my windows. And I also cloned DLC-GPU conda environment and uninstalled the original CUDA and cudnn in the environment to prevent conflict. TensorFlow starts to support CUDA 11.0 from TensorFlow 2.4, so I installed the TensorFlow 2.4 or tf-nightly-2.5 in the conda environment (via pip). I also tried TF-2.3 to check whether TF-2.3 is indeed incompatible with CUDA 11.x. I followed the https://github.com/DeepLabCut/DeepLabCut-core/blob/tf2.2alpha/Colab_TrainNetwork_VideoAnalysis_TF2.ipynb to install DeepLabCut-core tf 2.2 alpha and tf-slim and run the deeplabcut-core. However, I could not get it to start training in any of the settings. Here is the summary CUDA 11.0 | TF-2.3 | TF cannot recognize GPU as it is looking for .dll files that only exist in CUDA10.x CUDA 11.0 | TF-2.4 | TF can recognize GPU smoothly, cannot start training with an error message (see Notes 1) CUDA 11.0 | TF-nightly | TF can recognize GPU smoothly, cannot start training with an error message (see Notes 1) CUDA 11.1 | TF-2.4| TF can recognize GPU with a trick (see Notes 2), cannot start training with no error message CUDA 11.1 | TF-nightly | TF can recognize GPU with a trick (see Notes 2), cannot start training with no error message I tested some simple TensorFlow script (https://www.tensorflow.org/tutorials/quickstart/advanced), they seemed to work fine on GPU in the last 4 configurations that I listed above.

Notes 1: Error message: failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED. And I saw the VRAM exploded in Windows Task manager after I started training. I tried to restrict the memory to a lower use by "config.gpu_options.per_process_gpu_memory_fraction = 0.6". It did not help, unfortunately.

Notes 2: TF could not recognize GPU because it could find "cusolver64_10.dll" which exists in CUDA 11.0 but replaced by "cusolver64_11.dll" in CUDA 11.1. So I copied "cusolver64_11.dll" and renamed it as "cusolver64_10.dll". Although TF can recognize GPU after that, it cannot start training. I saw the VRAM usage increased (but did not explode) in task manager after training start and after ~ 30 seconds, ipython or python just closed itself without any error message.

I also carefully followed the suggestions in https://github.com/DeepLabCut/DeepLabCut/issues/944. They are very useful suggestions. However, I still cannot get my RTX3080 work.

Do you have any more suggestions that I could try? Does anyone have a guide to set DLC-Core on RTX 3000 Series?

Thank you in advance

dlramamurthy commented 3 years ago

Thanks for this post -- I have also been having these exact issues when trying to install with RTX 3070. I made an initial post which didn't go into the same level of detail as you have: https://github.com/DeepLabCut/DeepLabCut/issues/1078 I would love to hear if you find out how to get it running!

bobfromjapan commented 3 years ago

Hello. I'm a RTX3000 user too!

This may not be the answer you are looking for, but I was able to run DeepLabCut 2.2b8 on my RTX3080 using the package tensorlow-directml. This is an UNOFFICIAL WAY with a package not originally used by DeepLabCut, and it works slower than using native CUDA, but I was able to confirm that it works.

I wrote a report about this on imaga.sc: https://forum.image.sc/t/fyi-deeplabcut-worked-on-radeon-gpu-rtx3080-using-tensorflow-directml/47700

angelgho commented 3 years ago

OS: Win 10 Installation sequence: Build cuda following: https://www.reddit.com/r/tensorflow/comments/jsalkw/rtx_3090_and_tensorflow_for_windows_10_step_by/ Cuda: 11.1 cuDNN: v8.0.5.39 Anaconda env used: DLC-GPU (latest, v2.1.10.2) Tensorflow Version: pip install tensorflow-gpu==2.4.1 Deeplabcutcore: pip installed first, later from github directly (see below) tf-slim: pip install tf-slim==1.1.0

Hi all, Another RTX 3070 user checking in.

I found out that the pip installed deeplabcutcore seems to be an older version than the github one. When I ran the testscript.py with the pip installed deeplabcutcore, it gives a lot of errors related to tensorflow library while importing deeplabcutcore (for example, module 'tensorflow.python.framework.ops' has no attribute 'RegisterShape'). But if I download the github repo, it works.

Plus, I had the same problem with VRAM exploding, too. I guess it might be because I am using RTX 3070 as my displaying GPU, too. I found a solution in here: https://github.com/tensorflow/tensorflow/issues/46209 I added the following two lines after "import tensorflow as tf" in Lib\site-packages\deeplabcutcore\pose_estimation_tensorflow\train.py: physical_devices = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_devices[0], True). It then worked well.

Hope it helps! Best, Chen

F2AGLAXY commented 3 years ago

Hi Yao, I am new to DLC and I also use RTX2060 but it does not work. Can you share your config? Best