Inference on nuScenes dataset

wyt1004 commented 1 year ago

Hi! Thanks for the great work and open-sourced code! I tried to test the inference result of the model on the nuScenes dataset using only the pre-trained model, but the inference results are bad, can you tell me how to use the nuScenes data for inference correctly? Or should I retrain the model using the nuScenes data? Can you give me some suggestions? Thanks!

Brummi commented 1 year ago

Hi, thank you for your interest in our work! Could you please share some examples of the input and model output? Otherwise it is difficult to give advice.

Generally, there are a few common bugs to watch out for:

Wrong poses (world to camera vs camera to world)
Wrong depth range (on nuScenes it should be similar to KITTI)
Wrong frame arrangement
Wrong intrinsics

Best, Felix

wyt1004 commented 1 year ago

Hi Felix! Thank you very much for your prompt reply! Sorry for the late reply, I went to recheck my input because of your reply.

Because the pose that comes with the nuScenes dataset is a two-dimensional pose in the car body coordinate system, in order to be consistent with the three-dimensional pose input in your model, I replaced my own data set as input, but the depth prediction result is not ideal , but what is surprising is that the mask prediction of moving objects has achieved good results. In order for you to make suggestions for my problem, I will show you my input and output results:

Image size: (1242*376), image1 as the target frame, original input :

pose0 = torch.Tensor([[0.9998010436283502,-0.019835447741421597,0.002104276343901,313.607], [0.0197909081291784,0.9996156293120828,0.019414252917772,33.7917], [-0.002488558107939,-0.019368744766508,0.999809311128443,-0.909], [0.0, 0.0, 0.0, 1.0]])

pose1 = torch.Tensor([[0.9998283201686516, -0.018479183135248163, 0.0013600962840288001, 314.93], [0.018449016092791842, 0.9996378881999006, 0.019588926276083763, 33.8058], [-0.0017215912568832, -0.01956047080455624, 0.9998071933995607, -0.911], [0.0, 0.0, 0.0, 1.0]])

pose2 = torch.Tensor([[0.9998530080199892, -0.0171253369855544, 0.0008277782425960001, 316.253], [0.0171069642112456, 0.9996802938437198, 0.01861885984505152, 33.82], [-0.001146367813084, -0.01860196225094848, 0.9998263113546595, -0.917], [0.0, 0.0, 0.0, 1.0]])

intrinsics = [9.2894080946388851e+02, 0.000000000000e+00, 6.338766060393413e+02, 0.000000000000e+00, 9.2690441537547053e+02, 1.6755950806040642e+02, 0.000000000000e+00, 0.000000000000e+00, 1.000000000000e+00]

image0:

image1:

image2:

Output result:

prediction depth:

prediction mask:

output keyframe:

Brummi commented 1 year ago

I checked out the data and it seems that there is an issue with the poses (wrong coordinate system). E.g. in my data, the z axis is facing forward, while here the x axis seems to be facing forward. Therefore, the computed cost volume is meaningless and the network marks all cars as moving.

The poses have to be in the same format as for KITTI etc.. Best, Felix

wyt1004 commented 1 year ago

Hi Felix! According to your suggestion, I tried to convert the coordinate axis of the pose to the kitti format. The prediction effect of the depth map has been improved, but the desired effect has not yet been achieved. I guess this means I should retrain the model on my own dataset? new predicted depth: depth_yt

new prediction mask: mask_yt

Brummi commented 1 year ago

after a few tries, I got similar results. maybe you really have to retrain the model.

best, felix

Brummi / MonoRec

Inference on nuScenes dataset #48