facebookresearch / dinov2

PyTorch code and models for the DINOv2 self-supervised learning method.
Apache License 2.0
8.91k stars 784 forks source link

clean install of conda env create -f conda-extras.yaml on Ubuntu fails to install cuml-cu11, any ideas? #334

Open lovettchris opened 10 months ago

lovettchris commented 10 months ago

failes with this error

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Installing pip dependencies: - Ran pip subprocess with arguments:
['/home/smartreplayuser/miniconda3/envs/dinov2-extras/bin/python', '-m', 'pip', 'install', '-U', '-r', '/home/smartreplayuser/git/Facebook/dinov2/condaenv.slco21xq.requirements.txt', '--exists-action=b']
Pip subprocess output:
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting git+https://github.com/facebookincubator/submitit (from -r /home/smartreplayuser/git/Facebook/dinov2/condaenv.slco21xq.requirements.txt (line 1))
  Cloning https://github.com/facebookincubator/submitit to /tmp/pip-req-build-u5ykxk6q
  Resolved https://github.com/facebookincubator/submitit to commit 07f21fa1234e34151874c00d80c345e215af4967
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Collecting cuml-cu11 (from -r /home/smartreplayuser/git/Facebook/dinov2/condaenv.slco21xq.requirements.txt (line 3))
  Downloading cuml-cu11-23.12.0.tar.gz (6.8 kB)
  Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'

Pip subprocess error:
  Running command git clone --filter=blob:none --quiet https://github.com/facebookincubator/submitit /tmp/pip-req-build-u5ykxk6q
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
╰─> [16 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-j61__ee5/cuml-cu11_45092e6783154e1f9c70d31f80db8581/setup.py", line 137, in <module>
raise RuntimeError(open("ERROR.txt", "r").read())
RuntimeError:
###########################################################################################
The package you are trying to install is only a placeholder project on PyPI.org repository.
This package is hosted on NVIDIA Python Package Index.

This package can be installed as:

$ pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com cuml-cu11

###########################################################################################

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
fa
iled

CondaEnvException: Pip failed

And the recommended fix pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com cuml-cu11 fails with the same error.

clemsgrs commented 10 months ago

same error, used to work well but fails since today.

Florian2Richter commented 10 months ago

also same error here: FWIW Ubuntu 22.04 with CUDA 12.2 and 535.129.03 NVidia Driver... replacing "cuml-cu11" with "cuml-cu12" did not work

kuma94506 commented 10 months ago

You may try to install an older version like this: $ pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com cuml-cu11==23.10.0

lovettchris commented 9 months ago

that works, thanks.

clemsgrs commented 9 months ago

interesting, for me pip install --no-cache-dir --extra-index-url https://pypi.nvidia.com/ cuml-cu11==23.10.0 didn't work either (same error ; tried older versions too)

Florian2Richter commented 9 months ago

Seems to be fixed by now... usual pip install -r requirements.txt worked fine.

lovettchris commented 9 months ago

Interesting, Florian, can you post the version of CUDA, pytorch and Python that you are using?

Florian2Richter commented 9 months ago

Sure, in my virtual environment (venv, not conda) is Python 3.10.12, PyTorch 2.0.0+cu117 and NVIDIA 535.129.03 with CUDA 12.2

clemsgrs commented 9 months ago

issues seems fixed for me. fyi I'm using the conda installation.

lovettchris commented 9 months ago

@Florian2Richter interestingly conda.yaml and conda-extras.yaml contains python 3.9.

In order to get the dinov2 segmentation head working on Ubuntu I had to build mmcv from source using MMCV_WITH_OPS=1 pip install -e . which required a newer version of GCC that support C++17, and then I could get the segmentation head working on CUDA, and measured about 4 seconds per inference on a Tesla T4 GPU using small backbone dinov2_vits14. I also had to install ftfy and regex pip packages. For me the "pip install mmcv-full==1.5.0" results in the error:

ModuleNotFoundError: No module named 'mmcv._ext'

bruce-willis commented 7 months ago

I can confirm that installing mmcv from the source fixed the issue when running segmentation scripts. You need to clone the specific version (not just main branch) via the following command git clone https://github.com/open-mmlab/mmcv.git --branch v1.5.3 --single-branch and install nvcc v11.7 if needed before building mmcv (conda install -c conda-forge cudatoolkit-dev=11.7). For the regular pip install I get the same error as @lovettchris.

@lovettchris, have you tried to reproduce segmentation results? I am a little bit lost about patch size: all dinov2 backbones have patch size equal to 14, however in the segmentation evaluation it is assumed to be 16 (>It is used to produce a low-resolution logit map (eg 32x32 for a model with patch size 16) and the input image size is 512, which is divisible by 16 but not 14). After modifying the config for patch size equal to 14, I can partly reproduce results for ADE20k, but not for Pascal VOC.

EDIT (22.02.24): I've managed to reproduce results both for ADE20k and Pascal VOC. Don't forget to override init_weights() method for your backbone. It is not enough to load checkpoint weights during the constructor call. Otherwise, during segmentation training weights can be overridden by default weights initialization (source).

Your backbone (dinov2/eval/segmentation/models/backbones/vision_transformer.py) should look similar to this.

Feel free to ping me if you have some issues.

mfoglio commented 3 months ago

I had several issues setting up the environment for segmentation properly.

I did the following:

I still get error: ModuleNotFoundError: No module named 'mmcv.ops'. @lovettchris @bruce-willis how did you fix this? Thanks