aharley / simple_bev

A Simple Baseline for BEV Perception
MIT License
503 stars 79 forks source link

Coordinate in vox_util #7

Closed shawrby closed 2 years ago

shawrby commented 2 years ago

Hi there,

Thank you for your research.

Ref, Center, Mem, camA, camB, pix B coordinate in vox_util aren't well understood.. Could you explain about it easily?

aharley commented 2 years ago

Great question.

A/B (in function names) are arbitrary coordinate systems -- they are just placeholders.

ref = reference camera. At test time, this is facing forward.

mem = "memory" coordinates == model coordinates == voxel coordinates

cam = camera = 3d coordinate system

pix = pixels = projected camera coordinates (2d)

Feel free to ask more, to help me answer your precise question.

shawrby commented 2 years ago


I have a additional question.

The main parameters of the function in the picture below are,

Shouldn't a utils.geom.apply_4x4 function take an argument mem_T_ref?


aharley commented 2 years ago

No, the notation goes the other way. The way to read it is: ref_T_mem transports mem points into ref coordinates. The visual shortcut is: ref_T_mem * xyz_mem is a valid matmul, because the mem coords are adjacent. (This convention lets us easily keep track of valid transformations, such as point_a = a_T_b * b_T_c * c_T_d * point_d.)

shawrby commented 2 years ago

Thanks for the clear explanation!

shawrby commented 2 years ago


What exactly do cam0 and camXs mean?

aharley commented 2 years ago

cam0 is the camera being currently used as "reference" -- this dictates the orientation of the 3D/BEV tensors, which accordingly live in mem0 (i.e., ref2mem applied on cam0 things). At test time, cam0 is the forward-facing camera, and at training time this is a random camera. camXs is all other cameras.

shawrby commented 2 years ago


Thanks for the kind explanation !

shawrby commented 2 years ago


Sorry for the many questions..

I don't understand the get_occupancy function to get the occupancy map from radar data.. Could you recommend reference material or paper for understanding?

aharley commented 2 years ago

The process there is just to find out which voxels have a point inside, and then set the value to 1 for those voxels. This type of function is pretty common so you can maybe google "convert point cloud to occupancy grid" to see lots of answers.