Owen-Liuyuxuan / visualDet3D

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving / YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection
https://owen-liuyuxuan.github.io/papers_reading_sharing.github.io/3dDetection/GroundAwareConvultion/
Apache License 2.0
361 stars 76 forks source link

Training on custom dataset with shifted camera angle #69

Closed kevinstoesser closed 1 year ago

kevinstoesser commented 1 year ago

Hello,

first of all thank you very much for your work and this repository! I am currently trying to adapt your algorithm to a new custom dataset. This dataset also contains stereo images that are rectified with a good stereo calibration. The main difference with KITTI is that both of our cameras are tilted at a 20° angle to the ground. This rotation is included in the intrinsic parameters of our calibration files.

The problem is that with an experiment and a configuration that should result in overfitting, there is no good overfitting of the data. After training and saturating the losses, the 2D bounding boxes fit quite well (even though they are slightly larger than the gt_labels). The 3D bounding boxes also fit quite well, but deviate in the x-z-plane.

I guess KITTI's anchor box priors don't fit, since our data was taken with that camera shifted. I also disabled disparity loss during training to rule out an error.

Do you have any idea how to fix this problem? I noticed that the anchor priors include some parameters like sizes and scales. How do these affect the anchor priors? When I try to calculate anchor priors for our data, all anchor boxes are filtered out.

Thank you in advance, Kevin

kevinstoesser commented 1 year ago

For those who also have this problem: I transformed the gt labels to the "kitti" coordinate system and added the conversion of the coordinates to BackProjection(nn.Module). To comment on wether YOLOStereo3D works on different stereo setups: yes it does quite well with a little shift in the depth caused by a different baseline.