Closed braceal closed 4 years ago
This is interesting, I've never seen this before. I'll try to reproduce this today to see.
Anda
From: Alex Brace notifications@github.com Sent: Tuesday, July 21, 2020 11:00:03 PM To: braceal/molecules molecules@noreply.github.com Cc: Trifan, Anda atrifan2@illinois.edu; Mention mention@noreply.github.com Subject: [braceal/molecules] "CUDA error: invalid device ordinal" when non 0 gpus are specified. (#26)
python examples/pytorch/example_vae.py -i ../data/contact_maps.h5 -o ../output/ -m 3 -t symmetric -e 2 -b 128 -E 1 -D 2
CUDA devices: 1,2
Traceback (most recent call last):
File "examples/pytorch/example_vae.py", line 141, in
This happens when we set the encoder gpu to 1 and the decoder gpu to 2. Any idea what is going on here? @atrifan2https://github.com/atrifan2
Perhaps thishttps://github.com/allenai/allennlp/issues/1090 could help.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/braceal/molecules/issues/26, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHTMUE3GFO4LVVMDNR5GTRTR4ZP4HANCNFSM4PEJHD6Q.
I am not sure how pytorch handles GPUs.
But try export CUDA_VISIBLE_DEVICES=1,2
and just run -E 0 -D 1
.
Don't set gpu env inside script. Specify outside in the shell. When CUDA_VISIBLE_DEVICES=2,3 in that process the global device ids 2,3 become 0,1.
Branch: feature/multi-gpu-vae
This happens when we set the encoder gpu to 1 and the decoder gpu to 2 (-E 1, -D 2). Any idea what is going on here? @atrifan2
Settings that work: -E 0 -D 1, -E 1 -D 0 Settings that don't: -E 0 -D 2, -E 1 -D 1
Perhaps this could help.