Closed ciwei123 closed 1 year ago
I think you can change it to lidar_top without issue. The point of "ref" is in fact to be flexible on what the reference coordinate system is (even across training iterations).
@aharley Thanks for your quick reply. When I change ref to lidar_top, the rad_xyz_cam0 should be change to rad_xyz_lidartop? featcamXs should be change to featlidartop? In other words, I should change the feature to lidar top, and then change the feature in lidar top coordinate to mem coordinate system? Thank you very much!!
I guess you're talking about this kind of line:
https://github.com/aharley/simple_bev/blob/main/train_nuscenes.py#L146
You need to be quite careful here to make sure the coordinates are right. I could have been a bit clearer with the naming of xyz_rad
-- in the dataloader it's put into the same coordinates as the LiDAR, and this is why it's valid to apply cams_T_velo[:,0]
to transform it into the first camera's coordinate system.
If you are using lidar_top
as your reference coordinate system, then I think you can skip the transformations of lidar and radar xyz, since those are in lidar_top
coordinates already. Then you need to add a transformation to move the camera data into lidar coordinates, maybe by altering the transformation that goes into this argument:
https://github.com/aharley/simple_bev/blob/main/train_nuscenes.py#L182
@aharley Thanks for your reply.
You might be reading that wrong. The data comes from the RADAR_FRONT sensor, but the transformations lead it to the same coordinate system as the LiDAR data. Compare these lines: https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L122 https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L191
I think what you wrote is correct. You shouldn't have to do extra transformations to take things into mem
; you just need to put points into lidar_top
(instead of putting them into cam0
as done currently), and the rest will happen automatically. One stumbling block comes to mind though: if the axis directions of lidar_top
are different than cam0
(which has x=right, y=down, z=forward), then you may need to choose new dimensions for your voxelizer, or do a rotation.
- You might be reading that wrong. The data comes from the RADAR_FRONT sensor, but the transformations lead it to the same coordinate system as the LiDAR data. Compare these lines: https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L122 https://github.com/aharley/simple_bev/blob/main/nuscenesdataset.py#L191
- I think what you wrote is correct. You shouldn't have to do extra transformations to take things into
mem
; you just need to put points intolidar_top
(instead of putting them intocam0
as done currently), and the rest will happen automatically. One stumbling block comes to mind though: if the axis directions oflidar_top
are different thancam0
(which has x=right, y=down, z=forward), then you may need to choose new dimensions for your voxelizer, or do a rotation.
@aharley
- I see... If you are right about this, then maybe results could be improved slightly, because right now the radar is treated as though it were in LIDAR_TOP coordinates. Or maybe due to noise and sparsity, the impact of this is small. Thanks for your attention to detail here!
@aharley Thank you for your patient reply. A few other questions:
vox_inds = base + 0* dim2 + torch.floor((y-(self.YMIN))/vox_size_Y )* dim3 + torch.floor((x-(self.XMIN))/vox_size_X)
center_T_ref[:,0,3] = -self.XMIN/vox_size_X
get_occupancy
doesn't care about the MIN variables and vox_size variables.)oops, didn't mean to close
- Yes
- That line looks a bit scary. Maybe it's correct, but it's hard for me to picture it right now. I usually write things in a more factorized way. (Right now
get_occupancy
doesn't care about the MIN variables and vox_size variables.)- I think your version would make the center of the voxel equal to XMIN. The current line makes the left edge of the voxel equal to XMIN. Both can be good, but I think especially in low resolutions, making the edges to define your space makes more sense than using the centers. (e.g., think of a 1x1x1 voxelgrid. it should have volume in the corresponding metric space.)
@aharley We can get a metric from https://github.com/aharley/simple_bev/blob/be46f0ef71960c233341852f3d9bc3677558ab6d/utils/vox.py#L56
1./vox_size_X 0 0 -self.XMIN-vox_size_X/2.0
1./vox_size_Y 0 0 -self.YMIN-vox_size_Y/2.0
1./vox_size_Z 0 0 -self.ZMIN-vox_size_Z/2.0
0 0 0 1
Then we can get new xyz from https://github.com/aharley/simple_bev/blob/be46f0ef71960c233341852f3d9bc3677558ab6d/utils/vox.py#L57
new_x = 1./vox_size_X * old_x - self.XMIN-vox_size_X/2.0
new_y = 1./vox_size_Y * old_y - self.YMIN-vox_size_Y/2.0
new_z = 1./vox_size_Z * old_z - self.ZMIN-vox_size_z/2.0
I can not understand why this line is calculated like this:new_x = 1./vox_size_X * old_x - self.XMIN-vox_size_X/2.0
I think the formula should be : new_x = 1./vox_size_X * ( old_x - self.XMIN)
,
so I think the center_T_ref[:,0,3] = -self.XMIN/vox_size_X
.
Could you explain to me why this line is calculated like this:
https://github.com/aharley/simple_bev/blob/main/utils/vox.py#L90.
Hi, I think my previous "3." answer provides the explanation for this. What's missing?
@aharley Thanks for your sharing. I don't know if I understand correctly. The following code completes the conversion of ref to mem coordinate system. Mem is the coordinate of voxel, that is, the coordinate of bev feature. What does ref refer to?How do I determine this reference coordinate system? Can I specify a coordinate system at will? I understand that the code is cam_front. If I change to lidar_top as the reference coordinate system, does the code need to be changed? Looking forward to your reply, thank you very much!!
def get_mem_T_ref(self, B, Z, Y, X, assert_cube=False, device='cuda'):