Closed NikVard closed 8 months ago
Hi @NikVard, could you give some more detail on your settings and the error message ? Also note that if using deviceQuery
does not work, you can always manually set parameters and disable the automatic GPU detection: https://brian2cuda.readthedocs.io/en/latest/introduction/cuda_configuration.html
Hi @mstimberg, thanks for the prompt reply. The MWE is as follows:
from brian2 import *
import brian2cuda
set_device("cuda_standalone")
# Set the path to the deviceQuery binary
prefs.devices.cuda_standalone.cuda_backend.device_query_path = "/usr/local/cuda-11.4/samples/1_Utilities/deviceQuery/deviceQuery"
# Run the test
brian2cuda.example_run()
The last line of the error traceback is "RuntimeError: Running 'nvidia-smi -L' failed. This typically means that you have no NVIDIA driver installed. Are you sure there is an NVIDIA GPU on this machine?"
Running the binary manually gives me the correct information (image attached):
I followed the instructions from the brian2cuda documentation found here (point 2). Manually setting the preferences devices.cuda_standalone.cuda_backend.detect_gpus = False
, devices.cuda_standalone.cuda_backend.compute_capability = 7.2
, and devices.cuda_standalone.cuda_backend.gpu_id = 0seems to work, however, there are other errors (the
make` directives get killed by the system, which makes me wonder if there are other issues). If it helps, here is the full configuration as printed prior to running the model:
output.txt.
Note that manually setting the above parameters leads to the example run completing successfully. I took a look at the code and there are provisions for running the deviceQuery
binary (in the utils/gputools.py
), but it might be that a check that verifies that the nvidia-smi
binary is there is missed.
Let me know if there is anything I can test on my end or if I have neglected some information!
Just to be sure I understand it right: You do have the nvidia-smi
binary, it just comes from an older driver version and does not support querying GPU information? Or do you not have the binary at all?
You need nvidia-smi
even if you specify a custom deviceQuery
path. If you don't have nvidia-smi
at all, you can disable automatic GPU detection all together as @mstimberg mentioned:
prefs.devices.cuda_standalone.cuda_backend.detect_gpus = False
prefs.devices.cuda_standalone.cuda_backend.compute_capability = <compute_capability>
prefs.devices.cuda_standalone.cuda_backend.runtime_version = <runtime_version>
@denisalevi Understood. On the Jetson platform, the nvidia-smi
binary is not available at all and from the documentation I got that in case you are using older drivers which do not support the use of nvidia-smi
, then the use of the deviceQuery
binary would kick in instead.
On a similar note, should the runtime version be also set manually? Are there any other parameters that you would suggest I set?
Ah I see. Setting the deviceQuery
binary is meant for setups in which nvidia-smi
is available, but nvidia-smi --query-gpu=<parameter>
is not (that option was only added around CUDA 11.6 I believe). But even if you set deviceQuery
, nvidia-smi
is still used to get a list of all available GPUs. So in your case without nvidia-smi
, it will still fail.
The solution for you is then to disable automatic GPU detection. But you mentioned something about additional errors? If so, full error messages would be helpful.
The runtime version is set automatically via nvcc --version
. As long as you have the nvcc
binary available (which you need for compilation of the generated code anyways), you should be fine.
I just found a typo in the docs. It should be prefs.devices.cuda_standalone.cuda_backend.cuda_runtime_version
. But as I said, you probably won't need to set it.
I was just about to post a message about the typo, but you beat me to it. The other issues I am facing have more to do with memory optimization and I think are not relevant to this issue. Thanks for the help, setting everything manually does work nicely and the test completes successfully!
I am currently working on running a Brian2 model on a Jetson AGX Xavier, which runs on Ubuntu 20.4 and the NVIDIA drivers do not support the
nvidia-smi
binary. Instead, I am setting the path to thedeviceQuery
binary, which is ignored in favor of nvidia-smi.I noticed that in practice, if the
nvidia-smi
binary is not found, the function_run_command_with_output()
will return an error and never use the fallbackdeviceQuery
binary.