Emerge-Lab / gpudrive

GPU-acceleration of Nocturne via Madrona
MIT License
195 stars 17 forks source link

GPU compilation failure #244

Open johnviljoen opened 1 week ago

johnviljoen commented 1 week ago

This may be similar to the error another issue author is having, I understand if you want to merge this issue with that one.

My specifications

(please tell me if there is more information you need):

gpudrive installation:

neofetch basics:

NVIDIA (smi/nvcc outputs):

My error

Although the pytests run correctly - I noticed that these are all only run with Madrona on CPU execution mode. The following script just changes that to CUDA as is done in the tutorial notebooks (which also fail exactly at the final function call here):

import os from pathlib import Path import seaborn as sns import gpudrive from pygpudrive.env.config import SceneConfig from pygpudrive.env.scene_selector import select_scenes

working_dir = Path.cwd() while working_dir.name != 'gpudrive': working_dir = working_dir.parent if working_dir == Path.home(): raise FileNotFoundError("Base directory 'gpudrive' not found") os.chdir(working_dir)

scene_config = SceneConfig(path="data", num_scenes=1)

sim = gpudrive.SimManager( exec_mode=gpudrive.madrona.ExecMode.CUDA, # Specify the execution mode gpu_id=0, scenes=select_scenes(scene_config), params=gpudrive.Parameters(), # Environment parameters )

The result of this script for me at the moment is a simple compilation error, but deep within the Madrona setup which to me means it is not something wrong with my code, but probably a versioning issue somewhere:

Compiling GPU engine code: /home/jovi/Documents/gpudrive/external/madrona/src/mw/device/memory.cpp /home/jovi/Documents/gpudrive/external/madrona/src/mw/device/state.cpp /home/jovi/Documents/gpudrive/external/madrona/src/mw/device/state.cpp(232): error: namespace "std" has no member "max" max_column_size = std::max((uint32_t)col_row_bytes, max_column_size); ^

1 error detected in the compilation of "/home/jovi/Documents/gpudrive/external/madrona/src/mw/device/state.cpp".

Error at /home/jovi/Documents/gpudrive/external/madrona/src/mw/cpp_compile.cpp:100 in CompileOutput madrona::cu::jitCompileCPPSrc(const char , const char , const char , uint32_t, const char , uint32_t, bool) NVRTC_ERROR_COMPILATION Aborted (core dumped)

eugenevinitsky commented 1 week ago

Thanks for raising this issue. We are looking into this ASAP, we don't want new users to be blocked on compilation

aaravpandya commented 1 week ago

Hi @johnviljoen, Thanks for the detailed error reporting. I notice that your NVCC version is 11.5. It needs to be atleast 12.2 or higher. For reference, this is the NVCC output from a desktop we use.

Please update the NVCC version, and let me know if that fixes the issue.

johnviljoen commented 1 week ago

Hi - I have got the updated nvcc - 12.5 in my case and I still have the same error as of time of writing - here is the full console output, where test.py represents the small cuda test script I wrote above.

(gpudrive) jovi@jovi-B550I-AORUS-PRO-AX:~/Documents/gpudrive$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Jun__6_02:18:23_PDT_2024 Cuda compilation tools, release 12.5, V12.5.82 Build cuda_12.5.r12.5/compiler.34385749_0 (gpudrive) jovi@jovi-B550I-AORUS-PRO-AX:~/Documents/gpudrive$ python test.py Compiling GPU engine code: /home/jovi/Documents/gpudrive/external/madrona/src/mw/device/memory.cpp /home/jovi/Documents/gpudrive/external/madrona/src/mw/device/state.cpp /home/jovi/Documents/gpudrive/external/madrona/src/mw/device/state.cpp(232): error: namespace "std" has no member "max" max_column_size = std::max((uint32_t)col_row_bytes, max_column_size); ^

1 error detected in the compilation of "/home/jovi/Documents/gpudrive/external/madrona/src/mw/device/state.cpp".

Error at /home/jovi/Documents/gpudrive/external/madrona/src/mw/cpp_compile.cpp:100 in CompileOutput madrona::cu::jitCompileCPPSrc(const char , const char , const char , uint32_t, const char , uint32_t, bool) NVRTC_ERROR_COMPILATION Aborted (core dumped) (gpudrive) jovi@jovi-B550I-AORUS-PRO-AX:~/Documents/gpudrive$

aaravpandya commented 1 week ago

Hi @johnviljoen, I think 12.5 is not supported yet since we have not synced up with the upstream madrona. So we are currently on an older toolchain. I suggest to use 12.2 or at max 12.4 CUDA version. Also, there is a list of dependencies here that a user reported to have worked for them in a separate issue thread (here).

In the meantime, we are working on publishing a dockerfile that has all the necessary dependencies pre installed and can be used directy.

johnviljoen commented 1 week ago

downgrading CUDA ended up breaking my machine (something im sure we are all familiar with haha) - I will get back to this thread asap when I get around to fixing it