blackjack2015 / IRS

IRS: A Large Synthetic Indoor Robotics Stereo Dataset for Disparity and Surface Normal Estimation
50 stars 18 forks source link

camera pose convention #8

Open za-cheng opened 1 year ago

za-cheng commented 1 year ago

Hi there,

First huge thanks for publishing this dataset. I'm looking to use your dataset for MVS but struggle with the camera pose convention in Trace.txt. I wonder if you could provide more explanation please.

Are these camera-to-world or world-to-camera matrices, and how is the camera coordinate system defined (i.e. what are the +x,+y,+z directions - I assumed +x is right, +y down and +z forward but apparently that's not the case)? I also notice the matrix has translation on last row, instead of last column in MVS convention, should I transpose rotation matrix as well?

Cheers, Z

blackjack2015 commented 1 year ago

Hi, za-cheng,

Thanks for your interest of IRS.

  1. the matrices are camera-to-world;
  2. for the directions, +x is forward, +y right and +z up, which follows the standard of UE4. It is also called left-system.

We also notice your excellent work on Siggraph 2022 - "Diffeomorphic Neural Surface Parameterization for 3D and Reflectance Acquisition". Hope we can have a chance of cooperation. Cheers!

Best regards, Qiang Wang

ccj5351 commented 4 months ago

Thanks for the valuable discussion here. Can I know how to get the camera-to-world matrix from the Trace.txt. As the question by @za-cheng, "I also notice the matrix has translation on last row, instead of last column in MVS convention, should I transpose rotation matrix as well?"

For example: in a Trace.txt file, we can see:

0.09689467371068361 0.9952946409010922 -2.565641006486985e-07 0.0
-0.9951432653217077 0.09687994137690414 0.017440138865749036 0.0
0.0173581016055659 -0.0016896012468288792 0.999847909212335 0.0
-6.26467896 -20.06545654 1.50066193 1.0

How to get the regular "row-major" camera-to-world matrix? What it should be given the 4 lines above?

Thanks!

ccj5351 commented 4 months ago

Finally, I got the camera-to-world pose in the OpenCV style coordinate system.

Please see my code on how to generate the camera pose from the "/UE_Trace.txt" file (e.g., /IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt).

As for the transformation matrix from Unreal Engine (x Forward, y Right, z Up) to OpenCV-style (x Right, y Down, z Forward) coordinates:

You can check the details in Chapter 2.2 of the book John J. Craig, Introduction to Robotics: Mechanics and Control, Third Edition (2005) , see the screenshot below:

image

This way, we can get the matrix from Unreal Engine to OpenCV-style as:

T = np.array([
                  [0,1,0,0],
                  [0,0,-1,0],
                  [1,0,0,0],
                  [0,0,0,1]], dtype=np.float32)
    T_wue_2_w = T
    # Similarly, we can find the transformation from cue to c;
    T_cue_2_c = T
    T_c_2_cnet = np.linalg.inv(T_cue_2_c)

The generated camera poses have been verified by depth warping among multi-view images:

You can find the pixel highlighted by a red circle is visually correctly warped into another view highlighted by a green circle.

image

blackjack2015 commented 4 months ago

Finally, I got the camera-to-world pose in the OpenCV style coordinate system.

Please see my code on how to generate the camera pose from the "/UE_Trace.txt" file (e.g., /IRS/Auxiliary/CameraPos/Restaurant/DinerEnvironment_Dark/UE_Trace.txt).

As for the transformation matrix from Unreal Engine (x Forward, y Right, z Up) to OpenCV-style (x Right, y Down, z Forward) coordinates:

You can check the details in Chapter 2.2 of the book John J. Craig, Introduction to Robotics: Mechanics and Control, Third Edition (2005) , see the screenshot below:

image

This way, we can get the matrix from Unreal Engine to OpenCV-style as:

T = np.array([
                  [0,1,0,0],
                  [0,0,-1,0],
                  [1,0,0,0],
                  [0,0,0,1]], dtype=np.float32)
    T_wue_2_w = T
    # Similarly, we can find the transformation from cue to c;
    T_cue_2_c = T
    T_c_2_cnet = np.linalg.inv(T_cue_2_c)

The generated camera poses have been verified by depth warping among multi-view images:

You can find the pixel highlighted by a red circle is visually correctly warped into another view highlighted by a green circle.

image

Excellent! Would you mind proposing a pull request to help us refine the project? Thank you very much!

Best regards, Qiang Wang

ccj5351 commented 4 months ago

Sure. My pleasure. Just made the pull request. Thanks!