Question about training

NagabhushanSN95 / DeCOMPnet

Official code release for the ISMAR 2022 paper "Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images"

MIT License

5 stars 0 forks source link

Question about training #8

Closed jimmyfyx closed 5 months ago

jimmyfyx commented 10 months ago

Hi,

I have a few questions regarding the training part, especially the data format for training.

In src/flow_estimation/data_loaders/VeedDynamic01.py, within the load_training_data function, what is the purpose of the block from line 64 to 70? Could you clarify its use a little bit?
From the dataloader it seems that each piece of training data only contains a patch of the full-size image. If we only have one patch available each time, how can we estimate the flow of the whole image? Is this a intended design for the model with the backbone as PWC-Net? If there is somewhere I can get more information on this can you point me to that? Thanks!

NagabhushanSN95 commented 10 months ago

As I mentioned in this issue,

We train the flow estimator to predict the object motion from f_n to f_{n-2} as well as f_n to f_{n+1} (by picking one of them randomly).

Lines 64--75 essentially do that - pick one of them at random.

PWC-Net/ARFlow are fully convolutional - i.e. the resolution of input images can vary between train and test time, and it works fine. Due to GPU memory constraints, we train on patches and not on full images. Obviously, if you can train on full images, that's best. The model can learn better. However, while training on patches, the patch_size is important. patch_size should be bigger than the motion expected in the dataset. In other words, the cropped patches should have matching moving objects for the flow estimator to train. We choose the patch_size by visual inspection.

jimmyfyx commented 10 months ago

Thanks for the answer! Ah I see, sorry I missed your previous answer. But in the training data, shouldn't we need the flow from f_n to f_{n-2} and f_n to f_{n+1} at the same time? Why we need to pick one of them at random?

NagabhushanSN95 commented 10 months ago

Flow estimator is trained separately, independent of what our entire framework is doing. The goal of this training is to learn flow estimation between MPIs. So, it doesn't matter which frames you estimate the flow between. The only thing that matters is that camera motion should be nullified between the frames.