fabiotosi92 / NeRF-Supervised-Deep-Stereo

A novel paradigm for collecting and generating stereo training data using neural rendering
https://nerfstereo.github.io/
MIT License
349 stars 19 forks source link

Metrics results #6

Closed Nina-Konovalova closed 1 year ago

Nina-Konovalova commented 1 year ago

Thank you very much for your work!

I'd like to ask a question about the evaluation on 3nerf dataset. As I run for 100 random photos with baseline 0.50 - the obtained results seem to be relatively poor.

EPE: 2.5572 bad 1.0: 41.63% bad 2.0: 19.63% bad 3.0: 12.59%

While running on random 100 photos with baseline 0.10 seem to be much better EPE: 0.3576 bad 1.0: 3.93% bad 2.0: 1.70% bad 3.0: 1.06%

Should I do some disparity preprocessing steps before evaluation to obtain good results? Should some additional preprocessing steps be considered while training?

fabiotosi92 commented 1 year ago

Hello, the issue lies in the fact that you should not evaluate the network's predictions on the disparity maps obtained from NeRF, as they cannot be considered as ground truth. The evaluation should be conducted on major stereo benchmarks such as KITTI and Middlebury.

Nina-Konovalova commented 1 year ago

Thank you very much for the answer!

But as I understand, we train Stereo models only on NeRF dataset and then test on other data. So why don't we have good results on training data? Actually we have very different quality on different baselines.

And should we conduct any additional preprocessing steps for nerf disparity or we need only augmentations from raft-stereo?

fabiotosi92 commented 1 year ago

I apologize for the delay in my response. In these past few days, I have been busy due to the CVPR conference, and I couldn't respond promptly.

To address your question, it's important to clarify whether the evaluation was conducted on the disparity maps filtered with uncertainty (AO) or on the dense disparity maps. If the former, I suggest evaluating only on the points considered more reliable after removing outliers. For further guidance on filtering unreliable points, I recommend reading the paper, which provides detailed insights.

Additionally, it's worth considering that evaluating on disparity maps obtained from a larger baseline will inevitably lead to higher errors compared to evaluating on a smaller baseline, as the disparity values are larger.

However, I do not recommend relying solely on this approach to assess the quality of trained networks (as mentioned before). Instead, I recommend evaluating them on benchmarks that provide highly accurate ground truth disparity maps.