alexsax / 2D-3D-Semantics

The data skeleton from Joint 2D-3D-Semantic Data for Indoor Scene Understanding
http://3dsemantics.stanford.edu
Apache License 2.0
464 stars 67 forks source link

Camera Pose Format #12

Closed N-McA closed 6 years ago

N-McA commented 6 years ago

Is a more detailed description available for the camera pose format?

I was unable to determine how to transform from the point-cloud into camera coordinates.

One might assume that the usual application of the RT matrix would be sufficient, but it appears not. The T appears to be correct, but R does not align the axes appropriately in many images. I suspect this has something to do with the fields:

"camera_original_rotation" and "rotation_from_original_to_point" but their correct use is not apparent to me.

alexsax commented 6 years ago

The software that we used rotated the models upon loading (or used a different axis order), and so the R matrix is off by some 90-degree rotation from the point-cloud. If I have some time, I'll try to find that rotation and post it here and in the README. Or, if you find it, please share!

N-McA commented 6 years ago

Hey - yeah, if you could find it that would be very much appreciated!

N-McA commented 6 years ago

Or if there was a code block from the original work that might contain it, I wouldn't mind trawling through :)

N-McA commented 6 years ago

Some further experimentation reveals that it seems to be only some disjoint areas that suffer from a misalignment issue - in particular the corridor point-clouds appear to be rotated about some arbitrary point (well, probably their mean but I don't have time to debug further right now).

I've included some images that show the successfully recovered camera frustum from the RT matrix of a non-corridor area, and then an image that shows how the camera positions and the corridors seem mis-aligned.

If anyone has got this to work before I can include the fix in a python API that I'm planning to release for this dataset.

selection_391 selection_390 selection_392

ir0 commented 6 years ago

From the last image, the top-view floorplan, I see that you are using the "aligned" version of the point clouds. This version is not the globally registered one, but instead follows the alignment procedure discussed in Section 3.2 "Canonical Coordinate System Among Spaces" in the paper "3D Semantic Parsing of Large-Scale Indoor Spaces".

I am not sure what steps you are following to get the point clouds (there are different ways), but I can help you go back to the original version if you tell me which file you are using and from which dataset (S3DIS or 2D3DS). It is usually just a matter of not using/using the alignment angle value offered. A good way to check that your point cloud is correct, apart from corridors being globally registered, is to check that the rooms on the right side of the floorplan you sent actually have the doors facing to the left. If you take a closer look you'll see that in this floorplan the doors are facing to the right. This means that they are rotated by 180 degrees around their XYZ mean value, which is consistent to the alignment procedure discussed in Section 3.2 of the above paper. If you fix this you should not have any misalignment issues. If you still find any, please let us know.

N-McA commented 6 years ago

As suggested, this was due to the alignment procedure. I switched to loading the pointclouds from the .mat files included in the 2d-3d-s dataset and this immediately resolved the issue.

Loading the .mat files in python is not quite a one line operation; the (rather janky) code I used to do so is available in this gist: https://gist.github.com/N-McA/f8a34a46449994f8efbaef74c1536911

fuenwang commented 6 years ago

@N-McA, do you solve this surface normal problem? Because I also encounter this and I don't know how to convert normal to camera coordinate.

cazhang commented 5 years ago

I've got a question about the 'pose' as well: looking into the pano/pose folder, each json file holds 'camera_location' and 'camera_rt_matrix'. Just wondering what's the difference of 'camera_location' and the translation component of 'camera_rt_matrix'?

I'm also interested in how the camera pose of equirectangular images are extracted? Are they computed based on the camera poses of raw data, e.g. via different pitch and yaw values? @alexsax maybe you can help

MeatValley commented 1 year ago

@cazhang the diference is just in which coordinate system they are. The camara location in the file is the location in the world frame, and the translation is in the cam frame. To check that just multiply the camara position for the rotation matrix (the first 3 columns of the RT matrix) and you will get the last column. In other words, they are the same thing, just in different coordinate system, and the conversion between them is the Rotation matrix.

xuxiaoxxxx commented 10 months ago

@N-McA, do you solve this surface normal problem? Because I also encounter this and I don't know how to convert normal to camera coordinate.

Hi, do you slove this problem? I also don't know how to convert normal to camera coordinate. Can you share your covert code with me?