ardaduz / deep-video-mvs

Code for "DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion" (CVPR 2021)
MIT License
221 stars 29 forks source link

Hidden State Warping - GT vs Prediction #18

Open mohammed-amr opened 2 years ago

mohammed-amr commented 2 years ago

Hello,

I'm looking at your description for how to train the fusion model in the supplemental:

Finally, we load the best checkpoint and finetune only the cell for another 25K iterations with a learning rate of 5e−5 while warping the hidden states with the predicted depth maps.

image

The current training script at fusionnet/run-training.py doesn't have a flag for this. I can see that the GT depth is used for warping the current state at line 249.

What should I use as a depth estimator for this step? Should I borrow from this line at fusionnet/run-testing.py? Or (more likely) this differentiable estimator at line 157 in utils.py?

Thanks.

ardaduz commented 2 years ago

Sorry for not having that part in the repository. Both should work and give similar results since training is done with square shaped images and gradient flow disabled. As far as I can remember, non-differentiable function was having issues with the GPU memory space on GTX 1080Ti for some reason during training (which is already quite maxed out with the current batch and sub-sequence sizes). Therefore,

  1. You can borrow the lines 87, 88, [179, 191], 201, 202 from fusionnet/run-testing.py.
  2. Replace the non-differentiable function with the differentiable function just because of potential memory issues. Don't forget to .detach() the prediction tensor while assigning it to the previous_depth variable to disable the gradient flow.
  3. Set the image size function parameters correctly to Config.training_image_width and Config.training_image_height
  4. Finetune only the LSTMFusion module

I can not test this right now, so please let me know how it goes or if you need more info.

mohammed-amr commented 2 years ago

Thanks!

I've gotten it working given the changes you've suggested and the available code snippets. I've put my version of the file here.

It's training now. I'll let you decide if you want to close this issue, or keep it to signpost for others.

Thanks again for your quick reply and help.