Open eddiewrc opened 2 years ago
I have an addition to make: this is the GPU settings on my machine (3 gpus). Apparently the error happens just when I try to use GPUs 1 and 2, and the library works ok on what pytorch recognzes as cuda:0 . (which happens to be Quadro #1 )
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06 Driver Version: 470.129.06 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN Xp Off | 00000000:09:00.0 Off | N/A |
| 30% 52C P2 65W / 250W | 1521MiB / 12196MiB | 20% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro GV100 Off | 00000000:83:00.0 Off | Off |
| 38% 52C P2 40W / 250W | 3379MiB / 32508MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Quadro GV100 Off | 00000000:84:00.0 Off | Off |
| 36% 49C P2 40W / 250W | 8MiB / 32508MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
Hello, i also had this issue but I found a workaround. If you do torch.cuda.set_device(1)
before sending the model to the device with model.to('cuda:1')
it works fine :)
Hi, first of all thanks for sharing this library with all of us! Unfortunately I am encountering few problems while trying to run it. In particular, I tried to build the following network, which is supposed to take as input a sparse tensor of shape (8192, 16384). Part of it is now commented because I tried to locate the origin of the problem, and apparently it happens just with just the first Convolution module (so I commented the rest for now)
The error that I get is pasted below. The GPU is a quadro gv100, system cuda version 11.4, pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0
The error: