Closed wyt1004 closed 1 year ago
Hi, thank you for your interest in our work! Could you please share some examples of the input and model output? Otherwise it is difficult to give advice.
Generally, there are a few common bugs to watch out for:
Best, Felix
Hi Felix! Thank you very much for your prompt reply! Sorry for the late reply, I went to recheck my input because of your reply.
Because the pose that comes with the nuScenes dataset is a two-dimensional pose in the car body coordinate system, in order to be consistent with the three-dimensional pose input in your model, I replaced my own data set as input, but the depth prediction result is not ideal , but what is surprising is that the mask prediction of moving objects has achieved good results. In order for you to make suggestions for my problem, I will show you my input and output results:
Image size: (1242*376), image1 as the target frame, original input :
pose0 = torch.Tensor([[0.9998010436283502,-0.019835447741421597,0.002104276343901,313.607], [0.0197909081291784,0.9996156293120828,0.019414252917772,33.7917], [-0.002488558107939,-0.019368744766508,0.999809311128443,-0.909], [0.0, 0.0, 0.0, 1.0]])
pose1 = torch.Tensor([[0.9998283201686516, -0.018479183135248163, 0.0013600962840288001, 314.93], [0.018449016092791842, 0.9996378881999006, 0.019588926276083763, 33.8058], [-0.0017215912568832, -0.01956047080455624, 0.9998071933995607, -0.911], [0.0, 0.0, 0.0, 1.0]])
pose2 = torch.Tensor([[0.9998530080199892, -0.0171253369855544, 0.0008277782425960001, 316.253], [0.0171069642112456, 0.9996802938437198, 0.01861885984505152, 33.82], [-0.001146367813084, -0.01860196225094848, 0.9998263113546595, -0.917], [0.0, 0.0, 0.0, 1.0]])
intrinsics = [9.2894080946388851e+02, 0.000000000000e+00, 6.338766060393413e+02, 0.000000000000e+00, 9.2690441537547053e+02, 1.6755950806040642e+02, 0.000000000000e+00, 0.000000000000e+00, 1.000000000000e+00]
image0:
image1:
image2:
Output result:
prediction depth:
prediction mask:
output keyframe:
I checked out the data and it seems that there is an issue with the poses (wrong coordinate system). E.g. in my data, the z axis is facing forward, while here the x axis seems to be facing forward. Therefore, the computed cost volume is meaningless and the network marks all cars as moving.
The poses have to be in the same format as for KITTI etc.. Best, Felix
Hi Felix! According to your suggestion, I tried to convert the coordinate axis of the pose to the kitti format. The prediction effect of the depth map has been improved, but the desired effect has not yet been achieved. I guess this means I should retrain the model on my own dataset? new predicted depth:
new prediction mask:
after a few tries, I got similar results. maybe you really have to retrain the model.
best, felix
Hi! Thanks for the great work and open-sourced code! I tried to test the inference result of the model on the nuScenes dataset using only the pre-trained model, but the inference results are bad, can you tell me how to use the nuScenes data for inference correctly? Or should I retrain the model using the nuScenes data? Can you give me some suggestions? Thanks!