NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework
Other
3.78k stars 460 forks source link

CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Building PyTorch extension for tiny-cuda-nn version 1.7 #237

Open Justinfungi opened 1 year ago

Justinfungi commented 1 year ago

pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn#subdirectory=bindings/torch imageio_download_bin freeimage

Error is ARNING: Implying --no-binary=:all: due to the presence of --build-option / --global-option / --install-option. Consider using --config-settings for more flexibility. DEPRECATION: --no-binary currently disables reading from the cache of locally built wheels. In the future --no-binary will not influence the wheel cache. pip 23.1 will enforce this behaviour change. A possible replacement is to use the --no-cache-dir option. You can use the flag --use-feature=no-binary-enable-wheel-cache to test the upcoming behaviour. Discussion can be found at https://github.com/pypa/pip/issues/11453 Collecting git+https://github.com/NVlabs/tiny-cuda-nn#subdirectory=bindings/torch Cloning https://github.com/NVlabs/tiny-cuda-nn to /tmp/pip-req-build-lh3mplh3 Running command git clone --quiet https://github.com/NVlabs/tiny-cuda-nn /tmp/pip-req-build-lh3mplh3 Resolved https://github.com/NVlabs/tiny-cuda-nn to commit 14053e9a87ebf449d32bda335c0363dd4f5667a4 Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [8 lines of output] Traceback (most recent call last): File "", line 36, in File "", line 34, in File "/tmp/pip-req-build-lh3mplh3/bindings/torch/setup.py", line 30, in raise EnvironmentError("Unknown compute capability. Specify the target compute capabilities in the TCNN_CUDA_ARCHITECTURES environment variable or install PyTorch with the CUDA backend to detect it automatically.") OSError: Unknown compute capability. Specify the target compute capabilities in the TCNN_CUDA_ARCHITECTURES environment variable or install PyTorch with the CUDA backend to detect it automatically. No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Building PyTorch extension for tiny-cuda-nn version 1.7 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

note: This is an issue with the package mentioned above, not pip. hint: See above for details.

how could I solve it

peterstratton commented 1 year ago

I'm having this same issue

Batwho commented 1 year ago

similar issue, I'm using WSL2 with Ubuntu 20.04

error is

Collecting git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
  Cloning https://github.com/NVlabs/tiny-cuda-nn/ to /tmp/pip-req-build-r_hre1hk
  Running command git clone --filter=blob:none --quiet https://github.com/NVlabs/tiny-cuda-nn/ /tmp/pip-req-build-r_hre1hk
  Resolved https://github.com/NVlabs/tiny-cuda-nn/ to commit a77dc53ed770dd8ea6f78951d5febe175d0045e9
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      /home/z3qian/miniconda3/envs/ngp_pl/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
        example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-r_hre1hk/bindings/torch/setup.py", line 49, in <module>
          raise EnvironmentError("Unknown compute capability. Specify the target compute capabilities in the TCNN_CUDA_ARCHITECTURES environment variable or install PyTorch with the CUDA backend to detect it automatically.")
      OSError: Unknown compute capability. Specify the target compute capabilities in the TCNN_CUDA_ARCHITECTURES environment variable or install PyTorch with the CUDA backend to detect it automatically.
      No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
      Building PyTorch extension for tiny-cuda-nn version 1.7
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
shaoxiang777 commented 1 year ago

same issue

woodbridge commented 1 year ago

same here, anyone found a solution?

lizhiqi49 commented 1 year ago

I meet same issue and have solved it.

If you ensure that you have install cuda driver and the version of cuda and torch are matched, and your cuda install path is under CUDA_HOME, then it maybe because your machine has no GPU cards ...

This sounds a little stupid but it's really my situation. I build the environment on my lab's cluster login node rather than my own machine. On the cluster's login node, there are no GPU cards and only when you launch an experiment task, will the GPU resources that you request be aligned to your experiment.

So, I meet this error on the login node where no GPU card available but it goes very smoothly when I launch a test environment with 1 GPU aligned to it.

Hope my reply can help someone in similar situation as me.

jeevan-avataar commented 1 year ago

my machine have GPU below is nvidia-smi output, still facing the same issue. +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 | | 0% 28C P8 9W / 300W | 0MiB / 23028MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

adli-igc commented 9 months ago

You have to specify the TCNN_CUDA_ARCHITECTURE as shown in this line of code:

if "TCNN_CUDA_ARCHITECTURES" in os.environ and os.environ["TCNN_CUDA_ARCHITECTURES"]:
    compute_capabilities = [int(x) for x in os.environ["TCNN_CUDA_ARCHITECTURES"].replace(";", ",").split(",")]
    print(f"Obtained compute capabilities {compute_capabilities} from environment variable TCNN_CUDA_ARCHITECTURES")
elif torch.cuda.is_available():
    major, minor = torch.cuda.get_device_capability()
    compute_capabilities = [major * 10 + minor]
    print(f"Obtained compute capability {compute_capabilities[0]} from PyTorch")
else:
    raise EnvironmentError("Unknown compute capability. Specify the target compute capabilities in the TCNN_CUDA_ARCHITECTURES environment variable or install PyTorch with the CUDA backend to detect it automatically.")

Also note that any GPU-related commands such as torch.cuda.is_available() cannot be executed when building an image. https://discuss.huggingface.co/t/how-to-deal-with-no-gpu-during-docker-build-time/28544/4

adli-igc commented 9 months ago

This is an example of my Dockerfile where i've set the CUDA architecture

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:22.05-py3
FROM $BASE_IMAGE

RUN apt-get update -yq --fix-missing \
 && DEBIAN_FRONTEND=noninteractive apt-get install -yq --no-install-recommends \
    pkg-config \
    libglvnd0 \
    libgl1 \
    libglx0 \
    libegl1 \
    libgles2 \
    libglvnd-dev \
    libgl1-mesa-dev \
    libegl1-mesa-dev \
    libgles2-mesa-dev \
    cmake \
    curl \
    zip \
    openssh-server openssh-client git unzip colmap curl docker.io imagemagick

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics

# Default pyopengl to EGL for good headless rendering support
ENV PYOPENGL_PLATFORM egl

COPY 10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

# Set architecture for Tesla A10G (86), or A100 (80)
ENV TCNN_CUDA_ARCHITECTURES=86

# Upgrade pip to latest version to solve wheel issues. Refer to https://github.com/pypa/pip/issues/7555
RUN python3 -m pip install --upgrade pip

# Install new version of pytorch 2.0.1 with Cuda 11.7
RUN pip3 install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2

# additional libraries
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# tiny-cuda-nn.
RUN pip install --global-option="--no-networks" git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch