aharley / simple_bev

A Simple Baseline for BEV Perception
MIT License
457 stars 70 forks source link

Slight Error regarding translating radar data between coordinate frames #53

Closed seamie6 closed 2 months ago

seamie6 commented 2 months ago

I think there is a slight issue with translating between different coordinate frames when constructing the radar data and preparing it for training.

In the get_radar_data() function inside nuscenesdataset.py file we have the code excerpt:

    # Get reference pose and timestamp.
    ref_sd_token = sample_rec['data']['RADAR_FRONT']
    ref_sd_rec = nusc.get('sample_data', ref_sd_token)
    ref_pose_rec = nusc.get('ego_pose', ref_sd_rec['ego_pose_token'])
    ref_cs_rec = nusc.get('calibrated_sensor', ref_sd_rec['calibrated_sensor_token'])
    ref_time = 1e-6 * ref_sd_rec['timestamp']

    # Homogeneous transformation matrix from global to _current_ ego car frame.
    car_from_global = transform_matrix(ref_pose_rec['translation'], Quaternion(ref_pose_rec['rotation']),inverse=True)

The end result is all the radar data in stored in reference to the ego-frame of when RADAR_FRONT was captured for the current record.

Then in train_nuscenes.py we have:

rad_xyz_cam0 = utils.geom.apply_4x4(cams_T_velo[:,0], xyz_rad)

which translates the radar data from the ego-frame of when RADAR_FRONT was captured into the coordinate system of the REFERENCE CAMERA, as the 0th index represents the transformation from ego-frame to REF_CAMs coordinate frame.

To my understanding, when creating the radar data in get_radar_data(), shouldn't we instead be originally translating the radar data into ego-frame of when REFERENCE CAMERA was captured if we wanted everything to be perfect? ie:

    ref_sd_token = sample_rec['data']['REF_CAM']

The difference in position in ego-frame between these two would not be substantial at all, and I do not think it would provide much benefit but is this correct? Thank you