aharley / simple_bev

A Simple Baseline for BEV Perception
MIT License
502 stars 79 forks source link

How do I know what REF is? #22

Closed ciwei123 closed 1 year ago

ciwei123 commented 1 year ago

@aharley Thanks for your sharing. I don't know if I understand correctly. The following code completes the conversion of ref to mem coordinate system. Mem is the coordinate of voxel, that is, the coordinate of bev feature. What does ref refer to?How do I determine this reference coordinate system? Can I specify a coordinate system at will? I understand that the code is cam_front. If I change to lidar_top as the reference coordinate system, does the code need to be changed? Looking forward to your reply, thank you very much!! def get_mem_T_ref(self, B, Z, Y, X, assert_cube=False, device='cuda'):

aharley commented 1 year ago

I think you can change it to lidar_top without issue. The point of "ref" is in fact to be flexible on what the reference coordinate system is (even across training iterations).

ciwei123 commented 1 year ago

@aharley Thanks for your quick reply. When I change ref to lidar_top, the rad_xyz_cam0 should be change to rad_xyz_lidartop? featcamXs should be change to featlidartop? In other words, I should change the feature to lidar top, and then change the feature in lidar top coordinate to mem coordinate system? Thank you very much!!

aharley commented 1 year ago

I guess you're talking about this kind of line: https://github.com/aharley/simple_bev/blob/main/train_nuscenes.py#L146 You need to be quite careful here to make sure the coordinates are right. I could have been a bit clearer with the naming of xyz_rad-- in the dataloader it's put into the same coordinates as the LiDAR, and this is why it's valid to apply cams_T_velo[:,0] to transform it into the first camera's coordinate system.

If you are using lidar_top as your reference coordinate system, then I think you can skip the transformations of lidar and radar xyz, since those are in lidar_top coordinates already. Then you need to add a transformation to move the camera data into lidar coordinates, maybe by altering the transformation that goes into this argument: https://github.com/aharley/simple_bev/blob/main/train_nuscenes.py#L182

ciwei123 commented 1 year ago

@aharley Thanks for your reply.

  1. "xyz_rad-- in the dataloader it's put into the same coordinates as the LiDAR" means that the "xyz_rad" data in LIDAR coordinates? But I understand that the radar data obtained here is in the RADAR_FRONT coordinate system https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L1054
  2. My core doubt is why it needs to be converted to the mem coordinate system? Is my understanding below correct: The mem coordinate is not a real coordinate, it voxels the ref coordinate system. Whether the mem_T_ref matrix is related to the reference coordinate : https://github.com/aharley/simple_bev/blob/main/utils/vox.py#L56. If I change the reference coordinate to LIDAR_TOP, mem_T_ref is still the same, right?
    Thank you very much!!
aharley commented 1 year ago
  1. You might be reading that wrong. The data comes from the RADAR_FRONT sensor, but the transformations lead it to the same coordinate system as the LiDAR data. Compare these lines: https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L122 https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L191

  2. I think what you wrote is correct. You shouldn't have to do extra transformations to take things into mem; you just need to put points into lidar_top (instead of putting them into cam0 as done currently), and the rest will happen automatically. One stumbling block comes to mind though: if the axis directions of lidar_top are different than cam0 (which has x=right, y=down, z=forward), then you may need to choose new dimensions for your voxelizer, or do a rotation.

ciwei123 commented 1 year ago
  1. You might be reading that wrong. The data comes from the RADAR_FRONT sensor, but the transformations lead it to the same coordinate system as the LiDAR data. Compare these lines: https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L122 https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L191
  2. I think what you wrote is correct. You shouldn't have to do extra transformations to take things into mem; you just need to put points into lidar_top (instead of putting them into cam0 as done currently), and the rest will happen automatically. One stumbling block comes to mind though: if the axis directions of lidar_top are different than cam0 (which has x=right, y=down, z=forward), then you may need to choose new dimensions for your voxelizer, or do a rotation.

@aharley

  1. I think the radar data from https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L1054 in RADAR_FRONT coordinate. The line : https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L191 convert the different radar sensor data to RADAR_FRONT coordinate, because car_from_global is from RADAR_FRONT in https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L162
  2. You are right. cam0: XMIN, XMAX = -50, 50 ZMIN, ZMAX = -50, 50 YMIN, YMAX = -5, 5, LIDAR_TOP should be: XMIN, XMAX = -50, 50 ZMIN, ZMAX = -5, 5 YMIN, YMAX = -50, 50,(which has x=righ, y=forward, z=up)
aharley commented 1 year ago
  1. I see... If you are right about this, then maybe results could be improved slightly, because right now the radar is treated as though it were in LIDAR_TOP coordinates. Or maybe due to noise and sparsity, the impact of this is small. Thanks for your attention to detail here!
ciwei123 commented 1 year ago
  1. I see... If you are right about this, then maybe results could be improved slightly, because right now the radar is treated as though it were in LIDAR_TOP coordinates. Or maybe due to noise and sparsity, the impact of this is small. Thanks for your attention to detail here!

@aharley Thank you for your patient reply. A few other questions:

  1. The purpose of this line is to calculate which voxel index a point falls on,right? https://github.com/aharley/simple_bev/blob/main/utils/vox.py#L207
  2. If I do not convert to the mem coordinate system, I think the vox_inds should be: vox_inds = base + 0* dim2 + torch.floor((y-(self.YMIN))/vox_size_Y )* dim3 + torch.floor((x-(self.XMIN))/vox_size_X)
    (0*dim2,because I don't consider the Z axis) , right?
  3. Could you explain to me why this line is calculated like this: https://github.com/aharley/simple_bev/blob/main/utils/vox.py#L90, I think it should be: center_T_ref[:,0,3] = -self.XMIN/vox_size_X
aharley commented 1 year ago
  1. Yes
  2. That line looks a bit scary. Maybe it's correct, but it's hard for me to picture it right now. I usually write things in a more factorized way. (Right now get_occupancy doesn't care about the MIN variables and vox_size variables.)
  3. I think your version would make the center of the voxel equal to XMIN. The current line makes the left edge of the voxel equal to XMIN. Both can be good, but I think especially in low resolutions, making the edges to define your space makes more sense than using the centers. (e.g., think of a 1x1x1 voxelgrid. it should have volume in the corresponding metric space.)
aharley commented 1 year ago

oops, didn't mean to close

ciwei123 commented 1 year ago
  1. Yes
  2. That line looks a bit scary. Maybe it's correct, but it's hard for me to picture it right now. I usually write things in a more factorized way. (Right now get_occupancy doesn't care about the MIN variables and vox_size variables.)
  3. I think your version would make the center of the voxel equal to XMIN. The current line makes the left edge of the voxel equal to XMIN. Both can be good, but I think especially in low resolutions, making the edges to define your space makes more sense than using the centers. (e.g., think of a 1x1x1 voxelgrid. it should have volume in the corresponding metric space.)

@aharley We can get a metric from https://github.com/aharley/simple_bev/blob/be46f0ef71960c233341852f3d9bc3677558ab6d/utils/vox.py#L56 1./vox_size_X 0 0 -self.XMIN-vox_size_X/2.0 1./vox_size_Y 0 0 -self.YMIN-vox_size_Y/2.0 1./vox_size_Z 0 0 -self.ZMIN-vox_size_Z/2.0 0 0 0 1

Then we can get new xyz from https://github.com/aharley/simple_bev/blob/be46f0ef71960c233341852f3d9bc3677558ab6d/utils/vox.py#L57 new_x = 1./vox_size_X * old_x - self.XMIN-vox_size_X/2.0 new_y = 1./vox_size_Y * old_y - self.YMIN-vox_size_Y/2.0 new_z = 1./vox_size_Z * old_z - self.ZMIN-vox_size_z/2.0 I can not understand why this line is calculated like thisnew_x = 1./vox_size_X * old_x - self.XMIN-vox_size_X/2.0 I think the formula should be : new_x = 1./vox_size_X * ( old_x - self.XMIN) , so I think the center_T_ref[:,0,3] = -self.XMIN/vox_size_X . Could you explain to me why this line is calculated like this: https://github.com/aharley/simple_bev/blob/main/utils/vox.py#L90.

aharley commented 1 year ago

Hi, I think my previous "3." answer provides the explanation for this. What's missing?