EnVision-Research / Lift3D

CVPR 2023 Use NeRF-generated images to train your model.
73 stars 4 forks source link

Camera pose from 3D bounding box parameter #2

Closed SrinjaySarkar closed 10 months ago

SrinjaySarkar commented 1 year ago

Thank you for your work Lift3D. In the supplementary of your paper , you mention that you get the object(car) pose from a 3D bounding box parameter.

The final sampling pose P′ can be written as (x, y, z, l,w, h, θ), where x, y, z is position of 3D bounding box, l,w, h represent length, width, height of bounding box, θ is rotation along y axis

Can you please explain or provide some code how you did this ? Thanks.

Len-Li commented 1 year ago

Hi,

I model the NeRF by tightly bounding it using a 3D box. During ray casting, only points inside the box will be processed by NeRF network. This method is first proposed by neural-scene-graphs. You can check out the ray-box intersection algorithm that doing this thing.

SrinjaySarkar commented 1 year ago

Thanks for the reply and the resources. Did you train your network using KITTI object detection dataset or the KITTI tracking dataset ? Looking forward to the code release.

Len-Li commented 1 year ago

Hi, I am not trained on the KITTI dataset. I train my network on StyleGAN-generated images, and augment the objects to KITTI or nuscenes dataset. Note that our setting is different from neural-scene-graphs. They are doing a reconstruction task, while Lift3D is an unconditional generation framework.

SrinjaySarkar commented 1 year ago

HI, sorry for the confusion. Thanks for the explanation. Looking forward to your code.

SrinjaySarkar commented 1 year ago

Hi @Len-Li , could you please point out the function that gets the camera pose from the bounding box parameters ? In infer.py you get the camera pose using the get_campara_blender function.

Len-Li commented 1 year ago

Hi @Len-Li , could you please point out the function that gets the camera pose from the bounding box parameters ? In infer.py you get the camera pose using the get_campara_blender function.

Hi, sorry for the late. The camera pose is actually with (x,y,z)=(0,0,0), and rotation is a unit matrix. Since we only model our environment in a single frame, we do not need to model additional camera pose.