astra-vision / MonoScene

[CVPR 2022] "MonoScene: Monocular 3D Semantic Scene Completion": 3D Semantic Occupancy Prediction from a single image
https://astra-vision.github.io/MonoScene/
Apache License 2.0
688 stars 66 forks source link

Problems about visualizing output on my dataset #99

Closed Benson722 closed 5 months ago

Benson722 commented 7 months ago

Thank you for your work.

When I try the inference code about KITTI-360 on my dataset with the pretrained models on SemanticKITTI, I'm confused on the output: The drivable area is too short. It seems that can't deal with the distant area. Is my config wrong or perhaps the some details I forget.

About the config: I modify the camera intrinsics and the image resolution form (1408×376) to (1280×720). The 'cam2velo' matrix is not modified.

The output is: 外参没变2_627

Look forward to your reply. thx :)

Benson722 commented 7 months ago

Btw, when I run the inference code, the number of images I used was 15. Does a smaller number of images affect the results?

anhquancao commented 7 months ago

Hi, is it the same for every image?

Benson722 commented 7 months ago

yes it still does.

I re-modified the image resolution and camera parameters. The effect is a little better but the road length is still very short.

The resolution of the image is from (1280, 720) to (1408, 376). The internal parameters are changed as follows

    orig_size = (1280, 720)  
    target_size = (1408, 376)

    scale_width = target_size[0] / orig_size[0]
    scale_height = target_size[1] / orig_size[1]

    adjusted_cam_k = cam_k.copy()
    adjusted_cam_k[0, 0] *= scale_width  
    adjusted_cam_k[1, 1] *= scale_height  
    adjusted_cam_k[0, 2] *= scale_width  
    adjusted_cam_k[1, 2] *= scale_height  

对比1

对比2

Is this a problem with the camera's external parameters? Or is it a problem with image processing? The FOV of the camera I use is 120°.

Thank you.

anhquancao commented 6 months ago

Hello, are you using the dataloader that I provided? I have configured all the camera setup such as the camera parameters and the extrinsics in the provided dataloader code. You can find the relevant code in the following link: kitti_360_dataset.py#L73.

Benson722 commented 6 months ago

Yes, except for the data itself, the code has not been changed. I suspect there is something wrong with the depth estimation of the image. Maybe I train the model again on my dataset will solve this problem. Do you have any other good suggestions besides this method?

anhquancao commented 6 months ago

Hi, is this KITTI360 or your dataset? if it is your dataset then probably you need to retrain since the camera setup is different which is implicitly incorporated into the model during training i.e. through the projection module.