Haiyang-W / UniTR

[ICCV2023] Official Implementation of "UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation"
https://arxiv.org/abs/2308.07732
Apache License 2.0
276 stars 16 forks source link

how to calculate the nearest neighbor between pseudo 3d grid points and image token? #2

Closed miraclebiu closed 1 year ago

miraclebiu commented 1 year ago

Since each pixel in image represent a ray of 3d space, so I don't understand how to calculate the nearest neighbor between pseudo 3d grid points and image token In Section(3D geometric space), do we need to estimate the depth for each image token?

Haiyang-W commented 1 year ago

We don't need to estimate the depth of each image token. We compute the nearest neighbor based on the 2D distance between the 3D grid points and image tokens. We assume that 2D adjacency also reflects 3D adjacency to some extent.

Frankly, there is definitely some error in this, and there is a lot of room for improvement in how to construct non-learnable 2D->3D one-to-one mappings.

zlenyk commented 7 months ago

Do I understand correctly that it means that for given set of camera extrinsic parameters (so for given dataset), we will always assign the same depth to given pixel regardless of what's on the camera or in lidar?

nnnth commented 7 months ago

Yes, your understanding is correct. This is a somewhat rudimentary rule-based mapping approach, as we prefer not to introduce parameters for depth estimation.