High resolution output with affordable memory

facebookresearch / OmnimatteRF

A matting method that combines dynamic 2D foreground layers and a 3D background model.

MIT License

133 stars 13 forks source link

High resolution output with affordable memory #4

Open longyangqi opened 1 year ago

longyangqi commented 1 year ago

Great work! I have two questions:

Resolution: if I want to render scenes with higher resolution (e.g. the solo video with 1080p), what should i do?
Memory: if i render high resolution, the cost of memory is too larger, are there some approach to save memory? (e.g. reduce cache feature?)

Very thanks!

logchan commented 1 year ago

You'd need to train the model with higher resolution. In our experiments, 540p takes around 40GB VRAM to train.
The consumption mainly comes from the convolutional U-Net and the fact that we render / supervise on the entire image. So it may be possible to reduce VRAM requirements by patch-based training, which would require some engineering effort.

longyangqi commented 1 year ago

You'd need to train the model with higher resolution. In our experiments, 540p takes around 40GB VRAM to train.

The consumption mainly comes from the convolutional U-Net and the fact that we render / supervise on the entire image. So it may be possible to reduce VRAM requirements by patch-based training, which would require some engineering effort.

Thanks for your reply! Based on your reply, I have two more detailed questions:

"The consumption mainly comes from the convolutional U-Net": So the GPU costs mainly comes from the dynamic 2D foreground layers instead the nerf model for 3d background? In my case, i mainly care about the background, could i reduce the costs of the foreground layer to work around it ?
"patch-based training": is there any code/repository i can refer to ?

logchan commented 1 year ago

You can try jointly training at a low resolution to obtain masks that include the objects + shadows. Then, upsample the mask to a high resolution and optimize only the NeRF model with high resolution images.
I don't have one in mind, the general idea is using not entire images when creating a training batch, but do a random crop. If you mainly care about the background, (1) may be an easier path.

longyangqi commented 1 year ago

You can try jointly training at a low resolution to obtain masks that include the objects + shadows. Then, upsample the mask to a high resolution and optimize only the NeRF model with high resolution images.

I don't have one in mind, the general idea is using not entire images when creating a training batch, but do a random crop. If you mainly care about the background, (1) may be an easier path.

Great idea! I will have a try.