TRI-ML / packnet-sfm

TRI-ML Monocular Depth Estimation Repository
https://tri-ml.github.io/packnet-sfm/
MIT License
1.24k stars 243 forks source link

Why 3D conv? #8

Closed patrick-llgc closed 2 years ago

patrick-llgc commented 4 years ago

First of all, congrats on the impressive work! The image reconstruction sanity check is highly inspiring.

I have a question regarding why PackNet uses 3d conv.

I think what the PackNet wants to do is to blend the 2x2 spatial content that is now scattered into the channel dimension. So PackNet used the 3rd dimension to blend the channel. Maybe group conv makes more sense in this application? image

Another comment is that the paper mentioned that "2D conv are not designed to directly leverage the tiled structure of this feature space, instead, we propose to first learn to expand this structured representation via a 3d conv layer." I actually did not see in the ablation study how this is the case -- I only see that with 3d conv the results went better, but perhaps this is due to increased parameters in the model?

Thank you very much for your insights!

weihaosky commented 3 years ago

Also confused here