Open longyangqi opened 1 year ago
- You'd need to train the model with higher resolution. In our experiments, 540p takes around 40GB VRAM to train.
- The consumption mainly comes from the convolutional U-Net and the fact that we render / supervise on the entire image. So it may be possible to reduce VRAM requirements by patch-based training, which would require some engineering effort.
Thanks for your reply! Based on your reply, I have two more detailed questions:
- You can try jointly training at a low resolution to obtain masks that include the objects + shadows. Then, upsample the mask to a high resolution and optimize only the NeRF model with high resolution images.
- I don't have one in mind, the general idea is using not entire images when creating a training batch, but do a random crop. If you mainly care about the background, (1) may be an easier path.
Great idea! I will have a try.
Great work! I have two questions:
Very thanks!