NagabhushanSN95 / DeCOMPnet

Official code release for the ISMAR 2022 paper "Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images"
MIT License
5 stars 0 forks source link

Question about training #8

Closed jimmyfyx closed 5 months ago

jimmyfyx commented 10 months ago

Hi,

I have a few questions regarding the training part, especially the data format for training.

NagabhushanSN95 commented 10 months ago
  1. As I mentioned in this issue,

We train the flow estimator to predict the object motion from f_n to f_{n-2} as well as f_n to f_{n+1} (by picking one of them randomly).

Lines 64--75 essentially do that - pick one of them at random.

  1. PWC-Net/ARFlow are fully convolutional - i.e. the resolution of input images can vary between train and test time, and it works fine. Due to GPU memory constraints, we train on patches and not on full images. Obviously, if you can train on full images, that's best. The model can learn better. However, while training on patches, the patch_size is important. patch_size should be bigger than the motion expected in the dataset. In other words, the cropped patches should have matching moving objects for the flow estimator to train. We choose the patch_size by visual inspection.
jimmyfyx commented 10 months ago

Thanks for the answer! Ah I see, sorry I missed your previous answer. But in the training data, shouldn't we need the flow from f_n to f_{n-2} and f_n to f_{n+1} at the same time? Why we need to pick one of them at random?

NagabhushanSN95 commented 10 months ago

Flow estimator is trained separately, independent of what our entire framework is doing. The goal of this training is to learn flow estimation between MPIs. So, it doesn't matter which frames you estimate the flow between. The only thing that matters is that camera motion should be nullified between the frames.