facebookresearch / OmnimatteRF

A matting method that combines dynamic 2D foreground layers and a 3D background model.
MIT License
120 stars 14 forks source link

High resolution output with affordable memory #4

Open longyangqi opened 11 months ago

longyangqi commented 11 months ago

Great work! I have two questions:

  1. Resolution: if I want to render scenes with higher resolution (e.g. the solo video with 1080p), what should i do?
  2. Memory: if i render high resolution, the cost of memory is too larger, are there some approach to save memory? (e.g. reduce cache feature?)

Very thanks!

logchan commented 10 months ago
  1. You'd need to train the model with higher resolution. In our experiments, 540p takes around 40GB VRAM to train.
  2. The consumption mainly comes from the convolutional U-Net and the fact that we render / supervise on the entire image. So it may be possible to reduce VRAM requirements by patch-based training, which would require some engineering effort.
longyangqi commented 10 months ago
  1. You'd need to train the model with higher resolution. In our experiments, 540p takes around 40GB VRAM to train.
  2. The consumption mainly comes from the convolutional U-Net and the fact that we render / supervise on the entire image. So it may be possible to reduce VRAM requirements by patch-based training, which would require some engineering effort.

Thanks for your reply! Based on your reply, I have two more detailed questions:

  1. "The consumption mainly comes from the convolutional U-Net": So the GPU costs mainly comes from the dynamic 2D foreground layers instead the nerf model for 3d background? In my case, i mainly care about the background, could i reduce the costs of the foreground layer to work around it ?
  2. "patch-based training": is there any code/repository i can refer to ?
logchan commented 10 months ago
  1. You can try jointly training at a low resolution to obtain masks that include the objects + shadows. Then, upsample the mask to a high resolution and optimize only the NeRF model with high resolution images.
  2. I don't have one in mind, the general idea is using not entire images when creating a training batch, but do a random crop. If you mainly care about the background, (1) may be an easier path.
longyangqi commented 10 months ago
  1. You can try jointly training at a low resolution to obtain masks that include the objects + shadows. Then, upsample the mask to a high resolution and optimize only the NeRF model with high resolution images.
  2. I don't have one in mind, the general idea is using not entire images when creating a training batch, but do a random crop. If you mainly care about the background, (1) may be an easier path.

Great idea! I will have a try.