Open Maro1 opened 1 year ago
Hi, we zero-centered every point cloud and adjusted the camera extrinsics accordingly. Depending on the camera trajectory around the object, the centroid of the camera centers will usually be a bit above (0,0,0) since most turkers captured the objects from above the object. Finding a canonical rotation is very hard for most object categories. Regardless, we found out that many objects can be put "upright" by using the following function:
import torch
import copy
from typing import Tuple, Optional
from pytorch3d.transforms import so3_exp_map
from pytorch3d.structures import Pointclouds
from pytorch3d.renderer.cameras import CamerasBase
def adjust_scene_scale_up_vector(
cameras: CamerasBase,
pcl: Pointclouds,
to_vec=(0.0, -1.0, 0.0),
from_vec=(-0.0396, -0.8306, -0.5554), # in most cases, corresponds to the ground plane normal in CO3Dv2 scenes
):
"""
Rotates the up vector of input cameras and pointcloud to desired direction.
"""
T_adjust = torch.zeros(3)
rot_axis_angle = torch.cross(
torch.FloatTensor(to_vec),
torch.FloatTensor(from_vec),
).to(cameras.device)
R_adjust = so3_exp_map(rot_axis_angle[None])[0]
# adjust point cloud
pcl = pcl.update_padded(pcl.points_padded() + T_adjust)
pcl = pcl.update_padded(rescale_factor * pcl.points_padded())
pcl = pcl.update_padded(pcl.points_padded() @ R_adjust[None])
# adjust cameras
cameras_a = copy.deepcopy(cameras)
align_t_R = R_adjust.t()
align_t_T = -rescale_factor * T_adjust[None] @ align_t_R
align_t_s = rescale_factor
cameras_a.T = (
torch.bmm(
align_t_T[:, None].repeat(cameras_a.R.shape[0], 1, 1),
cameras_a.R,
)[:, 0]
+ cameras_a.T * align_t_s
)
cameras_a.R = torch.bmm(
align_t_R[None].expand_as(cameras_a.R),
cameras_a.R
)
return cameras_a, pcl
Hi, how to compute the rescale_factor?
It seems to me when visualizing some of the object's camera extrinsics that the origin is not at (0, 0, 0) and that the reference frame is offset (the circle of all the cameras is offset by an angle). I am trying to find the absolute position (and preferably rotation) of the cameras and am therefore wondering whether this information is contained within the dataset?