MIC-DKFZ / nnUNet

Apache License 2.0
5.88k stars 1.76k forks source link

nnUNet with CUDA 11.8 #2506

Closed satara closed 1 month ago

satara commented 1 month ago

I am installing nnUNet on a Docker container with CUDA driver 11.4. I first install the most recent torch compatible with the driver and it runs ok (11.8 should be compatible with 11.4):

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

When I instal nnUNet it removes torch 2.0.1 to install 2.4 and creates incompatibility with other libraries:

  Attempting uninstall: triton
    Found existing installation: triton 2.0.0
    Uninstalling triton-2.0.0:
      Successfully uninstalled triton-2.0.0
  Attempting uninstall: torch
    Found existing installation: torch 2.0.1+cu118
    Uninstalling torch-2.0.1+cu118:
      Successfully uninstalled torch-2.0.1+cu118
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.0.2+cu118 requires torch==2.0.1, but you have torch 2.4.1 which is incompatible.
torchdata 0.6.1 requires torch==2.0.1, but you have torch 2.4.1 which is incompatible.
torchtext 0.15.2+cpu requires torch==2.0.1, but you have torch 2.4.1 which is incompatible.
torchvision 0.15.2+cu118 requires torch==2.0.1, but you have torch 2.4.1 which is incompatible.
Successfully installed argparse-1.4.0 nnunetv2-2.5.1 torch-2.4.1 triton-3.0.0

And when I train my model I get the following error:

File "/opt/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 314, in _lazy_init
    torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

The same happens if I install an older version of torch (compatible with CUDA 11.3). Please help!

satara commented 1 month ago

Updated everything to latest version, build triton from source and now it is working. Thanks.