Open tb2-sy opened 1 year ago
Hi, Yes previously optimized models should be stored in cpu memory. I will double check. What gpu are you using? How many frames are being considered?
Hi, Yes previously optimized models should be stored in cpu memory. I will double check. What gpu are you using? How many frames are being considered?
Hi, I am using 48G A40 and 1000 frames of images. Maybe my 1000 images cover too big a scene?
I have optimized for longer sequences with 24 GB gpus. Would you mind sharing logs? At which point does it crash?
I have optimized for longer sequences with 24 GB gpus. Would you mind sharing logs? At which point does it crash?
I added some mlp layers to the class MLPRender module in the model. The specific error location is here. However, no error will be reported during the first few tensorf training processes, and an error will be reported after a few hundred frames.
That is odd. In GPU memory, there should only be more poses, which are tiny. Do you know if it crashes during training or testing? (it render some test frames during optimization)
The error is reported during the forward process of training. It cannot be ruled out that the error is reported because the number of training frames increases and the pose parameters increase. This is because the gpu card memory may be close to full. Maybe the useless pose parameters can also be placed on the CPU. This way there are no other factors causing CUDA out of memory.
OK, I'll recheck and make sure there is nothing unused in GPU memory this afternoon.
OK, I'll recheck and make sure there is nothing unused in GPU memory this afternoon.
Thank you so much!
I now delete optimizers and compute the alpha mask on CPU. Please let me know if the issue remains. See https://github.com/facebookresearch/localrf/commit/3905e3988e6f0e977a625b8f1f3710e90442f06b
I now delete optimizers and compute the alpha mask on CPU. Please let me know if the issue remains. See 3905e39
Okay, I'll test it now. Finally, I would like to ask you, will this change affect the model performance?
No it should be the same
Thanks for your nice work! I'm applying this model to a very long video trajectory, but I'm finding that CUDA out of memory occurs. According to my understanding of the code, there is an operation of putting all previously unused tensorf on the CPU, which greatly reduces the occupation of CUDA memory. In theory, does it support infinitely long video sequences in model training? However, during the training process, the gpu card memory is still exceeded. What is the reason? Looking forward to your reply.