Closed alfinnurhalim closed 2 years ago
Hi @alfinnurhalim ,
We simply call head_2d
after the call of ResNet
backbone. This head is implemented here and it contains only 2 MLPs
. To try it without 3d detection you can get features_2d
here, print them and return. You can even try to remove neck
, neck3d
and bbox_head
to make the model weight less.
Why is it not also predict the yaw of the cam ? One more thing, the ground truth for the layout loss is the size of the layout which is the size of the voxel times the number of voxel, is that correct? Thank you
Hi @alfinnurhalim ,
We follow Total3dUnderstanding paper and the benchmark assuming yaw=0
. For more details you can probably follow their paper / code. We adapt their angles -> extrinsic matrix transformation here.
No, the number of voxels times voxels size is fixed for each dataset. The layout here is the actual size of the room, limited by its walls, floor and ceiling. We also follow Total3dUnderstanding code for ground truth layout estimation, you can find this info in *.json.
Hi @filaPro, Thanks for the clarification!. I will look further into their paper for more info. Thank you very much
Hi Danila, I want to try your extra 2d Head to estimate the camera pose of my dataset (Indoor SUN-RGBD format), could you please elaborate more on how and where did you implemented it? how did you 'connect' this extra network to the main network? is it separated from the standard indoor_dataset head?
Thank you In advance