autonomousvision / unimatch

[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
https://haofeixu.github.io/unimatch/
MIT License
1.05k stars 106 forks source link

Sparse GT training question #11

Closed yzpick closed 1 year ago

yzpick commented 1 year ago

Thanks for the great work! A question I have is about fine-tuning on driving stereo dataset. Do you use the sparse GT directly for training or you preprocess the GT to get a dense map for training? Thanks!

haofeixu commented 1 year ago

Hi we directly use sparse GT.

yzpick commented 1 year ago

Thanks for the quick reply! Then with sparse GT, do you mask out the invalid pixels for loss calculation? Or you use other ways to compensate the difference between dense map and sparse map? Another question is that do you feel with less pixel GT guidance, the results accuracy reduced? In other words, do you think it's always better to have denser GT; or up to certain point, the density of GT doesn't make too much difference anymore?

Thanks for your time!

haofeixu commented 1 year ago

Yes, invalid pixels are masked out.

I feel denser GT would lead to better performance, but haven't conducted such comparisons thus I am not absolutely certain. To verify this, you might want to subsample the dense GT (e.g., on Scene Flow dataset) for training and see how the performance varies. This is an interesting question and I would also be happy to hear your findings if you are interested in doing such comparisons in future, thanks!

yzpick commented 1 year ago

Thanks! Yeah I will do some experiments about it. Can I have the configs of your training to reproduce the numbers in the paper regarding argoverse? Also, the final output after the fine-tuning is still dense or eventually it becomes sparse or has only valid values for sparse region? Wondering if the dense performance from the pretrained network can still be carried out in the fine tuning stage by some implicit interpolation of the network even though using sparse GT; or with more epochs, it sooner or later will regress to absolute sparse output. Eventually I want to input sparse but get dense output, not sure if such performance can be achieved by leveraging a dense pretrained model + fine-tune on sparse data directly, or I must add the explicit interpolation module into the network. Thanks!

haofeixu commented 1 year ago

From my experiments on argoverse dataset, the final predictions will be less reliable on regions that don't have ground truth supervison (e.g., the sky, similar phenomenon can also be observed from visual results on KITTI dataset). So if the model is finally finetuned with sparse GT, this problem might also occur even the model is pretrained on dense GT data.