Closed HardikJain02 closed 4 months ago
Thank you for your interest but this is a monocular system. There is only one frame. Of course, it is under the camera's coordinate.
If we want to obtain the normal in the world coordinate, we will need a rigid transformation from camera to world.
Hi @JUGGHM, I am trying to understand the correspondence of RGB values in the surface normal output image to the vector orientation of the surface normal in the 3D world. In other words, what is the mapping of coordinate axes to RGB channels?
I tried a quick experiment with a simple indoor scene with walls and ceilings where the orientation of normals can be done by eye estimate.
This is the surface normal image I get from Metric3D.
Now if I inspect the pixel values,
(1) the floor values are (126, 0, 126) which corresponds to (0.5, 0, 0.5) normal vector if normalized between [0,1] and (0, -1, 0) if I get the values directly from the model output in range [-1, 1]. Because I know that the floor surface normal should point up, this means that it is pointing towards the negative y-axis.
Similarly, for other surfaces
(2) the wall directly infront of the camera has values (0, 128, 127) which corresponds to (0, 0.5, 0.5) normal vector if normalized between [0,1] and (-1, 0, 0) if I get the values directly frm the model output in range [-1, 1]. Because I know that the wall surface normal should point directly at the camera, this means that it is pointing towards the negative x-axis.
(3) the ceiling has values (128, 254, 128) which corresponds to (0.5, 1, 0.5) normal vector if normalized between [0,1] and (0, 1, 0) if I get the values directly from the model output in range [-1, 1]. Because I know that the ceiling surface normal should point directly downwards, this means that it is pointing towards the positive y-axis.
(4) the right wall has values (123, 128, 0) which corresponds to (0.5, 0.5, 0) normal vector if normalized between [0,1] and (0, 0, -1) if I get the values directly from the model output in range [-1, 1]. Because I know that the right wall surface normal should point left, this means that it is pointing towards the negative z-axis.
Based on this interpretation, the coordinate system should look like this. This is a left-handed coordinate system.
Whereas, if we switch the first and third channels (or perform RGB2BGR), this is the normal output.
Now, if we interpret the data, we arrive at this coordinate system (X and Z are switched). And this is a right handed coordinate system.
I am trying to use the surface normal data to calculate vector angles for my downstream task. Let me know if my understanding of the coordinate system is correct. This is confusing because other surface normal estimation techniques seem to use different coordinate system -> RGB correspondence conventions.
See https://vision-explorer.allenai.org/surface_normals where the orientation is provided when you calculate the normal.
EDIT: I figured out the coordinate system representation of normals output by the model. It is the second image in the above discussion (z points directly into the scene, x to the right and y points downwards). The cv2 imwrite convention was messing up my understanding. @HardikJain02
To my understanding, depth map is in camera centric. What about normal map? Is it camera centric or world coordinate system?
If camera centric, how can we change it to the world coordinate system?
Context: I am interested in finding the coordinates of the bbox and polygons in the given 2d rgb image. Any other way to find coordinates would be fine as well.
@JUGGHM