Stability-AI / stable-fast-3d

SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement
https://stable-fast-3d.github.io
Other
1.2k stars 131 forks source link

Trouble getting started - "Ninja is required to load C++ extensions" #18

Closed DrCyanide closed 3 months ago

DrCyanide commented 3 months ago

I've got Ninja installed (via pip install Ninja) and I can import it in Python, but I'm still getting an error that Ninja is required. There are two onnxruntime errors before that which I'm stumped by.

Onnxruntime 1.18.1 Ninja 1.11.1.1 CUDA 12.4 cuDNN 9.3

Running on an RTX 3070 - which should be new enough there's no issues.

D:\Documents\AI\2D-to-3D\stable-fast-3d>python gradio_app.py
2024-08-05 06:37:25.0756191 [E:onnxruntime:Default, provider_bridge_ort.cc:1731 onnxruntime::TryGetProviderInfo_TensorRT] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\onnxruntime\capi\onnxruntime_providers_tensorrt.dll"

*************** EP Error ***************
EP Error D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:456 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
 when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
****************************************
2024-08-05 06:37:25.1522281 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 onnxruntime::TryGetProviderInfo_CUDA] D:\a\_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

Traceback (most recent call last):
  File "D:\Documents\AI\2D-to-3D\stable-fast-3d\gradio_app.py", line 43, in <module>
    model = SF3D.from_pretrained(
  File "D:\Documents\AI\2D-to-3D\stable-fast-3d\sf3d\system.py", line 89, in from_pretrained
    model = cls(cfg)
  File "D:\Documents\AI\2D-to-3D\stable-fast-3d\sf3d\models\utils.py", line 29, in __init__
    self.configure(*args, **kwargs)
  File "D:\Documents\AI\2D-to-3D\stable-fast-3d\sf3d\system.py", line 139, in configure
    self.baker = TextureBaker()
  File "D:\Documents\AI\2D-to-3D\stable-fast-3d\sf3d\texture_baker.py", line 13, in __init__
    self.baker = slangtorch.loadModule(
  File "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\slangtorch\slangtorch.py", line 617, in loadModule
    rawModule = _loadModule(fileName, moduleName, buildDir, options, sourceDir=outputFolder, verbose=verbose, includePaths=includePaths, dryRun=False)
  File "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\slangtorch\slangtorch.py", line 536, in _loadModule
    slangLib, metadata = compileAndLoadModule(
  File "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\slangtorch\slangtorch.py", line 427, in compileAndLoadModule
    slangLib = _compileAndLoadModule(metadata, sources, moduleName, buildDir, slangSourceDir, verbose)
  File "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\slangtorch\slangtorch.py", line 466, in _compileAndLoadModule
    return jit_compile(
  File "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\slangtorch\util\compile.py", line 71, in jit_compile
    _write_ninja_file_and_build_library(
  File "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\torch\utils\cpp_extension.py", line 1793, in _write_ninja_file_and_build_library
    verify_ninja_availability()
  File "C:\Users\Username\AppData\Roaming\Python\Python310\site-packages\torch\utils\cpp_extension.py", line 1842, in verify_ninja_availability
    raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions
jammm commented 3 months ago

Is pytorch installed correctly and did you do pip install -r requirements.txt and pip install -r requirements_demo.txt ? I would recommend creating a fresh venv, installing the requirements there using the above pip commands and then retrying gradio_app.

DrCyanide commented 3 months ago

I did both pip installs. PyTorch 2.2.0+cu121 is installed.

I'll try a virtual environment, see if that fixes it.

DrCyanide commented 3 months ago

Setting up a venv and following the instructions. That got my PyTorch up to 2.4.0+cu124 (I don't think I'd tried to update pytorch for this project, so it's possible there was a miss match between pytorch and CUDA)

After that I got an OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

Indeed, my CUDA_HOME seems to be empty - both inside the venv and on my system. This should mean I'm past the ninja error, and into new territory. I'm seeing posts that it should be something like C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4 but I don't have a NVIDIA GPU Computing Toolkit folder. I've been checking what version of CUDA I have by using the command line nvidia-smi command. It might be worth noting that nvcc --version doesn't work on my system.

Any ideas on how to find my actual Cuda path, so I can manually set that?

jammm commented 3 months ago

Hmm. CUDA_HOME should have been automatically set for you when you installed the CUDA toolkit. Maybe try reinstalling CUDA?

DrCyanide commented 3 months ago

Reinstalled CUDA 12.4 (to make sure it'd be consistent with everything that was already installed and working), and I've got mixed results.

On my system (outside of the venv) I now have a CUDA_PATH that's populated (seems to be used interchangeably with CUDA_HOME). nvcc --version now works, and says it's CUDA 12.4. Trying to run python gradio_app.py I get a new runtime error: RuntimeError: D:\a\_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:891 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasnt able to be loaded. Please install the correct version of CUDA andcuDNN as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. I have onnxruntime 1.18.1, the table at the URL says I should be using CUDA 12.x and cuDNN 9.x - which I am. Trying where cudnn* returns that it couldn't find it. I tried re-installing cuDNN and opening a new terminal, but still nothing.

Inside the venv both CUDA_PATH and CUDA_HOME are empty. I tried to delete the venv and create a new one, but I still have they're still empty. The error there is back to the Ninja is required to load C++ extensions.

DrCyanide commented 3 months ago

I feel like I'm going insane. I tried uninstalling onnx, onnxruntime, and onnxruntime-gpu, (since they seemed to be having issues finding the path) then re-installing everything with the pip install -r requirements.txt from my system, and now I'm back to RuntimeError: Ninja is required to load C++ extensions

jammm commented 3 months ago

It's a bit of a weird issue unfortunately. Typically installing pytorch itself should have automatically installed ninja for you. When you install ninja ninja --version should work. If it doesn't, find ninja inside in C:\Users\Username\AppData\Roaming\Python\Python310\Scripts and add that folder in PATH, then try again. If you can't find it in there, it means pip install Ninja is installing Ninja in a different directory altogether. The only solution I could think of in that case is to manually download the Ninja exe and add that in your PATH - https://github.com/ninja-build/ninja/releases

DrCyanide commented 3 months ago

It's a bit of a weird issue unfortunately. Typically installing pytorch itself should have automatically installed ninja for you. When you install ninja ninja --version should work. If it doesn't, find it inside the bin folder in C:\Users\Username\AppData\Roaming\Python\Python310 and add that folder in PATH, then try again. If you can't find it in there, it means pip install Ninja is installing Ninja in a different directory altogether.

OK, that fixed it! When I was checking if ninja was installed, I was doing it from inside the Python interactive console, importing it and checking the version. It never occurred to me that it might be trying to run it as it's own separate app via command line.

DrCyanide commented 3 months ago

Just to be clear, if you get a OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. and your CUDA_HOME variable is set (you can check with echo %CUDA_HOME% on Windows), then you can try re-installing PyTorch to fix it. Pip install the version that matches what CUDA version you have installed. You can check CUDA version with nvcc --version.