Open EricLBuehler opened 5 months ago
Is pytorch able to see the GPU? Also what cuda toolkit version is being targeted by cudarc (if using cuda-version-from-build-system, is it being compiled on this machine?)
@EricLBuehler any more information on this issue? Will close in a week if not
@coreylowman sorry for not getting back! I am running this on my GPU and Pytorch can see it (torch.cuda.is_available() == True
).
@EricLBuehler are there any differences with dynamic loading vs dynamic linking features for cudarc? Also curious about what toolkit version you are targeting in cudarc features
I am using cuda-version-from-build-system
and dynamic-linking
. How should I try dynamic loading?
If you don't enable the dynamic-linking
feature it will use dynamic loading.
🤔 Could you try targeting 12.2 (cuda-12020
) instead of version from build system? Just curious if that would change anything.
Hmm yeah, same error. Current:
cudarc = { version = "0.11.5", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-12020"], default-features=false }
I got nothing off the top of my head. Do you get this error if you git clone cudarc and try to run the unit tests?
cargo test --tests --no-default-features -F std,cuda-12050,driver
Is this running inside a docker container?
If that doesn't work I'd probably try to go to c++ level and verify a simple example there that links to cuda finds gpu. If that doesn't work then that at least tells us that pytorch is doing something special that we need to copy.
Hi both, I also have as similar error:
DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed")
note: run with RUST_BACKTRACE=1
environment variable to display a backtrace
Aborted
[jzhao399@atl1-1-02-018-25-0 release]$ which nvidia-smi
/usr/bin/nvidia-smi
[jzhao399@atl1-1-02-018-25-0 release]$ nvidia-smi
Wed Jul 17 11:25:54 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:C1:00.0 Off | 0 |
| N/A 34C P0 43W / 250W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
via PyTorch, this can be solved but not sure how to solve here.
Thanks,
Jianshu
Hello all,
Thanks for your great work here! When I run using
cudarc
, I get the error:Here is my system information:
I would appreciate any help!