bennyguo / instant-nsr-pl

Neural Surface reconstruction based on Instant-NGP. Efficient and customizable boilerplate for your research projects. Train NeuS in 10min!
MIT License
856 stars 84 forks source link

GPU out of memory at trainer.text() stage but has no issue while trainer.fit() stage #61

Open xiaohulihutu opened 1 year ago

xiaohulihutu commented 1 year ago

Hi there,

Could you please kindly give me some hints about how to solve the issue below?

When I was running nsr_pl, even though my train image set and test image set have both 73 images, the testing stage will have out of GPU memory issue. With ddp strategy in training stage, I can see the GPU memory evenly spread to 4 GPUs. But in the testing stage, all the memories will flow to one GPU only and stop with GPU out-of-memory error. The trainer is the same, why is this happening? Any potential things I can do to fix this issue? Thanks in advance

image

bennyguo commented 1 year ago

Hi! Sorry that I haven't fully tested the code on multiple GPUs. It could originate from the aggregation of all outputs after testing: https://github.com/bennyguo/instant-nsr-pl/blob/2daaa53c9bf5dabefc41236c92ed1c2fa7cbcf73/systems/nerf.py#L189 A temporary fix could be using only 1 GPU for testing. You could easily resume from trained checkpoints as explained in README. I would mark this as enhancement and experiment myself when I have time.