GAP-LAB-CUHK-SZ / Total3DUnderstanding

Implementation of CVPR'20 Oral: Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image
MIT License
415 stars 50 forks source link

RGB image or RGB-D for inference ? #29

Open Fizmath opened 3 years ago

Fizmath commented 3 years ago

Hello

Can we use simple RGB images taken by mobile phones in prediction ?

Thanks

alando46 commented 3 years ago

You can but you will need to get the intrinsic matrix from your phone's camera system. You also need to finetune a 2d detector to generate 2d bounding boxes specific to the classes recognized by Total3DUnderstanding.

Fizmath commented 3 years ago

Thank you very much for the response.

I get camera intrinsic parameters Focal Length = 3.46 mm , Sensor Size=4.66*3.51 , Pixel Array Size=4160*3120 , Orintation = 90 degree in android by installing a device info app. So, from these we can construct the intrinsic matrix cam_K.txt, right ?

Now, I wonder how we build camera pose (extrinsic matrix) from this intrinsic matrix ? I searched the web but no straight forward answer. How did you do that in your work ?

Another question : after getting a result from your algorithm, what are the possible solutions for stitching room layouts from overlapping photos to build a complete room perimeter ? Is it possible ?

Thanks