about multi-gpu acceleration

flyingshan commented 1 year ago

Hello, Following the document, I set the flag "--pipeline.ip2p-device cuda:1" to utilize two GPUs to train the in2n model. But I found that the training speed did not improve. Is this normal? Hoping for your advice!

ayaanzhaque commented 1 year ago

Hi!

I'm not sure why training speed did not improve. Could you share some screenshots of your nvidia-smi on the two GPUs as well as the training logs from nerfstudio? That'll help us figure it out.

flyingshan commented 1 year ago

And here are some logs from the nerfstudio: Step (% Done) Train Iter (time) ETA (time) Train Rays / Sec [0m ----------------------------------------------------------------------------------- [0m 30020 (100.07%) 1 s, 16.704 ms 23 h, 59 m, 40 s 277.02 K [0m 30030 (100.10%) 961.287 ms 23 h, 59 m, 32 s 286.20 K [0m 30040 (100.13%) 957.287 ms 23 h, 59 m, 22 s 296.96 K [0m 30050 (100.17%) 953.794 ms 23 h, 59 m, 13 s 292.95 K [0m 30060 (100.20%) 957.893 ms 23 h, 59 m, 3 s 281.41 K [0m 30070 (100.23%) 953.860 ms 23 h, 58 m, 54 s 287.69 K [0m 30080 (100.27%) 952.990 ms 23 h, 58 m, 44 s 285.33 K [0m 30090 (100.30%) 957.117 ms 23 h, 58 m, 34 s 282.15 K [0m 30100 (100.33%) 957.164 ms 23 h, 58 m, 25 s 289.42 K [0m 30110 (100.37%) 960.392 ms 23 h, 58 m, 15 s 286.27 K [0m

I installed in2n inside the official docker image of nerfstudio.

ayaanzhaque commented 1 year ago

Ah yes sorry, they do only run alternately, so it won't speed up training by running the two processes in parallel. The reason we say it will increase speed is if your GPU util is near max, it slows down overall training. Hopefully that makes sense.

ayaanzhaque / instruct-nerf2nerf

about multi-gpu acceleration #11