Cross-Sequence calibration errors

tobiasfshr commented 1 year ago

According to the paper, the sequences are registered in a city-scale coordinate system. However, when I overlay the LiDAR scans of multiple sequences, the ground planes do not align well for some of them (see picture).

These are the sequence IDs I overlaid: b9f73e2a-292a-3876-b363-3ebb94584c7a 8aad8778-73ce-3fa0-93c7-804ac998667d b51561d9-08b0-3599-bc78-016f1441bb91 6b0cc3b0-2802-33a7-b885-f1f1409345ac c990cafc-f96c-3107-b213-01d217b11272 0c61aea3-3cba-35f3-8971-df42cd5b9b1a 7cb4b11f-3872-3825-83b5-622e1a2cdb28 7c30c3fc-ea17-38d8-9c52-c75ccb112253 a2f568b5-060f-33f0-9175-7e2062d86b6c cea5f5c2-e786-30f5-8305-baead8923063 f41d0e8f-856e-3f7d-a3f9-ff5ba7c8e06d c654b457-11d4-393c-a638-188855c8f2e5 a359e053-a350-36cf-ab1d-a7980afaffa2 6f2f7d1e-8ded-35c5-ba83-3ca906b05127

ezgif-3-1130390f04

argoverse-admin commented 1 year ago

Hi @tobiasfshr, thank you for your interest in Argoverse 2. It looks like you are working with the Argoverse 2 Sensor Dataset, from the log IDs you shared (e.g. https://github.com/argoverse/av2-api/blob/main/src/av2/datasets/sensor/splits.py#L515).

The behavior you are seeing is actually expected, as each local map is relevant only to a single log. Please refer to our NeurIPS '21 paper. In the abstract, we mention:

In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry —
sourced from data captured in six distinct cities.

We provde more detail in Section 3.4:

Each scenario carries its own local map region, similar to the Waymo Open Motion [12] dataset. 
This is a departure from the original Argoverse datasets in which all scenarios were localized onto
two city-scale maps—one for Pittsburgh and one for  Miami. In the Appendix, we provide examples.
Advantages of per-scenario maps include more efficient queries and their ability to handle map
changes. A particular intersection might be observed multiple times in our datasets, and there could
be changes to the lanes, crosswalks, or even ground height in that time.

Please also see the section on HD maps in the Argoverse User Guide: https://argoverse.github.io/user-guide/api/hd_maps.html#overview, or in the Sensor Dataset section of the user guide.

tobiasfshr commented 12 months ago

Hey, thanks for the response! You are right, I'm talking about the sensor dataset. However, I do not use HD Maps, I use the poses provided in city_SE3_egovehicle which according to the documentation should be in a global, city-scale coordinate system (https://argoverse.github.io/user-guide/datasets/sensor.html#pose). The paper also mentions in Sec. 3.1:

In addition, camera intrinsics, extrinsics and 6-DOF ego-vehicle pose in a global coordinate system are provided.

I can successfully use those poses to find sequences that span the same geographic area, but once I overlay the LiDAR sweeps as above, I get problems with the ground height. Is there any way to align them with the data provided, or will I have to calculate the transformations myself?

argoverse-admin commented 11 months ago

Sorry for the confusion from the documentation, we should likely update that (feel free to open a PR here). The poses for each scenario are aligned to the local map specific to that scenario, so there is an equivalence between the local maps and log-specific global coordinate systems. I recommend aligning yourself if you wish to calibrate across sequences.

tobiasfshr commented 11 months ago

okay, thanks for the explanation. But I think a log-specific global coordinate system might not be correct either, since I can successfully align sequences with the poses from city_SE3_egovehicle, just not perfectly (especially with respect to ground height). So it seems to me that the sequences are registered in a city-scale, global coordinate system, but the alignment is imprecise (possibly due to errors in GPS-based localization?). So I guess it would be helpful if you could provide some more details on the data collection process in the documentation so that people can understand better what level of accuracy to expect.

James-Hays commented 11 months ago

Hi Tobias,

I expect that the individual sequences are accurately localized, but they are accurately localized to different "base maps" with slightly different coordinate systems. We didn't release the various city-scale "base maps" with Argoverse 2.

I expect there to be only a handful of centimeters in error between the lidar and the HD map elements accompanying each log. But there is no guarantee about the alignment between the coordinate systems of different logs.

You probably could solve for a transformation that aligns the various logs. Or even align them manually.

tobiasfshr commented 11 months ago

Hey James, thank you for clarifying, that helps! I did align the logs via ICP now.

argoverse / av2-api

Cross-Sequence calibration errors #224