SamsungLabs / imvoxelnet

[WACV2022] ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection
MIT License
280 stars 29 forks source link

Extra 2D Head #42

Closed alfinnurhalim closed 2 years ago

alfinnurhalim commented 2 years ago

Hi Danila, I want to try your extra 2d Head to estimate the camera pose of my dataset (Indoor SUN-RGBD format), could you please elaborate more on how and where did you implemented it? how did you 'connect' this extra network to the main network? is it separated from the standard indoor_dataset head?

Thank you In advance

filaPro commented 2 years ago

Hi @alfinnurhalim ,

We simply call head_2d after the call of ResNet backbone. This head is implemented here and it contains only 2 MLPs. To try it without 3d detection you can get features_2d here, print them and return. You can even try to remove neck, neck3d and bbox_head to make the model weight less.

alfinnurhalim commented 2 years ago

Why is it not also predict the yaw of the cam ? One more thing, the ground truth for the layout loss is the size of the layout which is the size of the voxel times the number of voxel, is that correct? Thank you

filaPro commented 2 years ago

Hi @alfinnurhalim ,

We follow Total3dUnderstanding paper and the benchmark assuming yaw=0. For more details you can probably follow their paper / code. We adapt their angles -> extrinsic matrix transformation here.

No, the number of voxels times voxels size is fixed for each dataset. The layout here is the actual size of the room, limited by its walls, floor and ceiling. We also follow Total3dUnderstanding code for ground truth layout estimation, you can find this info in *.json.

alfinnurhalim commented 2 years ago

Hi @filaPro, Thanks for the clarification!. I will look further into their paper for more info. Thank you very much