[INSTALL] Cellpose not interfacing with GPU torch

iamlll commented 2 months ago

Install problem I have tried installing cellpose and torch (GPU version) in several conda environments (deleting the original environment before trying again) without issue, following the README installation instructions. However, when I try running cellpose in a script it cannot detect / interface with my computer's GPU. Here is the setup of my latest attempt:

First, create an environment with conda env create -f environment.yml Next, uninstall torch: pip uninstall torch Install torch with pip3 install torch --index-url https://download.pytorch.org/whl/cu118 (I've also tried using conda commands to install pytorch and installing older versions of cuda. The end result is always the same, i.e. that my torch version is not installed properly)

Environment info pkglist.txt I am using an Nvidia GeForce GTX Titan X. Package list when using the command conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia to install pytorch instead of torch: pkglist_pytorch.txt

Run log Here's the test code I was using to check whether cellpose could access my GPU:

io.logger_setup()
core.use_gpu()

This code snippet returns the following output:

2024-04-15 19:20:12,179 [INFO] WRITING LOG OUTPUT TO /home/user/.cellpose/run.log
2024-04-15 19:20:12,179 [INFO] 
cellpose version:   3.0.7 
platform:           linux 
python version:     3.8.5 
torch version:      2.2.2+cu118
(<Logger cellpose.io (INFO)>, PosixPath('/home/user/.cellpose/run.log'))
>>> core.use_gpu()
2024-04-15 19:20:25,852 [INFO] TORCH CUDA version not installed/working.
False

rockystones commented 2 months ago

I had similar problems before, tried the following procedures, and fixed the problem.

Check your GPU driver and CUDA compatibility: Use nvidia-smi to find the highest CUDA version supported by your GPU and current GPU driver. It will display something like:

| NVIDIA-SMI 532.09                 Driver Version: 532.09       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                      TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660 Ti    WDDM | 00000000:01:00.0  On |                  N/A |
| N/A   57C    P8                9W /  N/A|    350MiB /  6144MiB |      3%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

For newer GPU and up-to-date drivers, this is not a problem, but it might be an issue for older GPUs that are not supporting new CUDA versions.

Check the pytorch and other CUDA packages installed use conda list. You should have something like these:

cuda-cccl                 12.4.127                      0    nvidia
cuda-cudart               11.8.89                       0    nvidia
cuda-cudart-dev           11.8.89                       0    nvidia
cuda-cupti                11.8.87                       0    nvidia
cuda-libraries            11.8.0                        0    nvidia
cuda-libraries-dev        11.8.0                        0    nvidia
cuda-nvrtc                11.8.89                       0    nvidia
cuda-nvrtc-dev            11.8.89                       0    nvidia
cuda-nvtx                 11.8.86                       0    nvidia
cuda-profiler-api         12.4.127                      0    nvidia
cuda-runtime              11.8.0                        0    nvidia

pytorch                   2.2.2           py3.8_cuda11.8_cudnn8_0    pytorch
pytorch-cuda              11.8                 h24eeafa_5    pytorch

The CUDA packages version and pytorch-cuda version should not be higher than the CUDA version from nvidia-smi.

Check the version of CUDA toolkit installed, use nvcc --version. You should have outputs like below to indicate the CUDA compiler driver version, which should align with the pytorch-cuda version before. I've seen people reporting problems when the nvcc and pytorch-cuda version don't align:
```
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
```
If 'nvcc' is not recognized as an internal or external command, operable program or batch file. then manually install/reinstall the right version of CUDA toolkit from CUDA archive (https://developer.nvidia.com/cuda-toolkit-archive) which includes the Nvidia CUDA Compiler (NVCC).
After all the steps before are checked, try the check method from the pytorch website using:
```
import torch
torch.cuda.is_available()
```
or
```
python -c "import torch; print(torch.cuda.is_available())"
```
If the return is True then that means cellpose can recognize the GPU and then in the software check the box to "use GPU". In my experience, the "use GPU" checkbox is always gray and not usable if the torch.cuda.is_available() returns False.

When you run the analysis with cellpose, you can get the output of GPU running like this:

2024-04-16 17:06:43,472 [INFO] ** TORCH CUDA version installed and working. **
2024-04-16 17:06:43,472 [INFO] >>>> using GPU

Good luck!

iamlll commented 2 months ago

Thank you so much, this is so helpful and finally fixed my problem! I'll outline what I did in case it may be helpful to other users:

nvidia-smi outputted an error message that said it could not establish communicate with the driver I had been using, so I uninstalled everything related to nvidia and cuda with
```
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
```
Then I checked "Software & Updates" --> "Additional Drivers" (on Linux) to install/update my driver, after which nvidia-smi worked again.
Followed the cellpose README instructions with environment.yml to install pytorch. In previous attempts I've had some issues with conda being able to solve my environment when I try to install other packages after cellpose, so I just added all the other packages I knew I needed to the same .yml file before creating the conda environment.
Downloaded corresponding version (11.8) of nvidia-cuda-toolkit from the developer website, the installation by default tries to download both the Nvidia driver and the toolkit, so I had to uncheck the "Driver" option while installing.
Added /usr/local/cuda-11.8/lib64 to LD_LIBRARY_PATH variable and /usr/local/cuda-11.8/bin to PATH variable
Somehow the conda installation conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia for GPU torch did not work for me (torch would not import properly, probably since some of the cuda packages installed via this command were cuda version 12 instead of 11), so I had to use the pip installation pip3 install torch --index-url https://download.pytorch.org/whl/cu118

After that, torch.cuda.is_available() == True!

MouseLand / cellpose

[INSTALL] Cellpose not interfacing with GPU torch #916