Closed blacksino closed 4 years ago
The main consideration is the proximity with disparity
In a stereo rig, the disparity is proportional to inverse depth. And since disparity is a 1D optical flow map, first methods that comes to mind when estimating it are borrowed from optical flow.
In this case, the main basis is FlowNet, an optical flow neural network. The authors later extended their network to DispNet, to get depth from stereo.
Now that they had a well working network that actually outputs inverse depth, it made sense to Zhou et al to use the same network, since it worked so well.
As such, Zhou's network outputs inverse depth. But this output is inverted, because photometric loss needs depth in the general case of displacement (because stereo was just a perfect lateral translation)
For my PhD defense I made several slides to explain it, you can get it here, interesting slides start at 10 (sorry it's in french, but you will get the math)
Bottom line, outputting inverse depth in this particular case has no real justification other than legacy, you can output depth if you want.
That helps A LOT. Thank U for your explanation!
@ClementPinard Hey, it seems that the url for your PhD defense is invalid now.
Indeed I just discovered that my website had been kicked out of my university, I updated the links :)
you can find the slides here https://clementpinard.fr/pdf/PhdThesis/robust_depth_learning_defense.pdf
Project page (with slides and the manuscript) here : https://clementpinard.fr/phd_thesis
Thank you for your excellent work! I have some doubts about when it comes to pred layer,Zhou use sigmoid ,then reverse it to calculate photometric loss,why?