astra-vision / MonoScene

[CVPR 2022] "MonoScene: Monocular 3D Semantic Scene Completion": 3D Semantic Occupancy Prediction from a single image
https://astra-vision.github.io/MonoScene/
Apache License 2.0
708 stars 69 forks source link

About visualization code #69

Closed joonsu0109gh closed 1 year ago

joonsu0109gh commented 1 year ago

I'm attempting to use your visualization code, and I have a few questions. In this image, the 'fov_mask' and 'camera line' don't seem to be exactly the same. Which one is more precise?

image

anhquancao commented 1 year ago

Hello, The FOV mask is more correct. I made the camera bigger to visualize better.

joonsu0109gh commented 1 year ago

Thank you for your prompt response. I have one more question, could I ask you for detailed values of the precise image pixel coordinates, or is there a specific reference I should use?

anhquancao commented 1 year ago

I don't understand your question. The FOVs voxels are computed by projecting all voxels on the image and checking if the projected pixels are inside and in front of the image (depth) > 0. The camera FOV is just a pentahedron. I took the 4 pixels at the corners of the image, fix a depth value for them then unproject them into 3D. They are the 4 corners of the pentahedron. The last corner is the camera position.

joonsu0109gh commented 1 year ago

I apologize for causing confusion. What I want to ask is how I can get the precise 3D coordinates of the 4 corners of an image.

anhquancao commented 1 year ago

You can take a look at this code.

joonsu0109gh commented 1 year ago

I've already looked at that code, and it seemed to me that the image corner coordinates obtained through the code are different from the image location that can be indirectly verified through the FOV. That's why I asked the first question. You answered that the camera was made larger for visualization, so I asked how I can obtain the precise coordinates of the camera corner. Thank you for answer.

anhquancao commented 1 year ago

I think you need to set d=distance from camera to the image plane. http://www.cs.toronto.edu/~jepson/csc420/notes/imageProjection.pdf

But in KITTI, f is in pixel unit so probably you need to find a way to convert it back to meter.