Misalignment of the 3D bounding box on the image

electricmanlpl commented 4 years ago

Hi,

Great thanks for your great work. I am curious about the labeling process. Do you label the 3D bounding box purely on the point cloud or also cross-validate on the image? I observe some misalignment issues when projecting the 3D bounding box on the image, especially for the further objects. I wonder it is due to the label accuracy or calibration accuracy?

Any answer will be appreciated.

johnwlambert commented 4 years ago

Hi @electricmanlpl, are you using the labels from Argoverse 1.1?

Also, are you using motion compensation, as we suggest in our cuboids_to_bboxes() demo script?

electricmanlpl commented 4 years ago

Thanks for your reply. My original post didn't use motion compensate but did use 1.1. I tried the motion compensate in the following figure, but didn't see much difference. The color box is the label at lidar timestamp, the yellow one is the compensated box.

log id: 53213cf0-540b-3b5a-9900-d24d1d41bda0 cam_timestamp: 315976455227286032 lidar_timestamp: 315976455220186000

log id: 53213cf0-540b-3b5a-9900-d24d1d41bda0 cam_timestamp: 315976454228285136 lidar_timestamp: 315976454220209000

James-Hays commented 4 years ago

Hi @electricmanlpl,

Sorry for the slow follow up. I've been digging into this to try and find an answer. You're right that the distant objects in the front view of this log don't have very accurate cuboids. I don't believe it's a calibration issue because other objects, including objects in the same view later in the log, look fairly accurately bound by the cuboids. I believe something has gone slightly wrong with vehicle pose (which can mess up our cuboid post-processing).

However, these objects are quite far away. The stereo cameras have a longer field of view than the ring cameras so it might not be obvious, but this scene has the AV looking at the intersection of 17th and Liberty, while the AV is nearly at 16th and Liberty. So the objects are ~160 meters away at times and don't have many lidar returns on them. Some recent AV datasets have cropped their annotations at 100 meters or less, but we included everything even when noisy (like these objects). We do limit our tracking evaluation to 100 meters which would not include these objects for some of their lifetime (obviously the vehicles in opposing lanes get closer, but by then I believe the annotations are more accurate).

Please let us know if you have any other concerns. I'm glad to see people using the 3d annotations.

electricmanlpl commented 4 years ago

Hi, @James-Hays Thanks for your answer. Yes, I also realize that the objects are too far away and it's extremely hard to have an accurate projection on the large focal length stereo camera. Thanks for your efforts and very great api functions.

argoverse / argoverse-api

Misalignment of the 3D bounding box on the image #78