Open ChaosAdmStudent opened 2 months ago
I'm curious if our original code can run on different cuda devices on your server? If you do not add CUDV_VISIBLE_DEVICE, you need to revise this code: https://github.com/Xinjie-Q/GaussianImage/blob/f06988cce9ef8a40eed847f1c8b241439eed4624/train.py#L28. In the code, we have specified that it is running on cuda:0.
I'm curious if our original code can run on different cuda devices on your server? If you do not add CUDV_VISIBLE_DEVICE, you need to revise this code:
https://github.com/Xinjie-Q/GaussianImage/blob/f06988cce9ef8a40eed847f1c8b241439eed4624/train.py#L28
. In the code, we have specified that it is running on cuda:0.
In my codebase, I am just using the project_gaussians_2d
and rasterize_gaussians_sum
functions instead of making a SimpleTrainer2d
class instance to start the training. I make sure to host all the inputs to these functions to a user-specified device but if I do anything other than cuda:0, it was initially giving me an error.
I assumed it could be because the cuda code is running on cuda:0 by default (for rendering). So I added cudaSetDevice(device_id);
in the bindings.cu
file for these two functions and re-compiled the package. After doing this, it started working but the code ran much much slower on the other cuda devices. After inspecting nvidia-smi
, I could see that when user device input is cuda:1 or cuda:2, it still hosts some part of the script on cuda:0. I guess the slowdown is because some data is repeatedly getting communicated back and forth between cuda devices. I was wondering if I will have to add the cudaSetDevice(device_id);
in every custom cuda kernel that is implemented?
I tried to modify the codebase a little to allow me to run it on different cuda device but I always end up with an "illegal memory access was encountered" error if I use anything other than cuda:0. Any idea why this is happening and how I can fix it?
I believe the error originates in the project_gaussians_2d function. If I use cuda:1 device and try to print
xys
(or any other output from this function), I get the cuda illegal memory access error. However, if I use cuda:0, they print out just fine.