concept-graphs / concept-graphs

Official code release for ConceptGraphs
MIT License
425 stars 67 forks source link

How to geneate syncronized visulization - animate_mapping_save.py #20

Closed ibrahimyousri closed 9 months ago

ibrahimyousri commented 10 months ago

I ran the animate_mapping_save.py script but the bottom point cloud views are not synchronized to the top frames POV images https://drive.google.com/file/d/1Hlg5o3z1aXD7C03Co4iDw3jrRyVRqlQj/view?usp=sharing

how to make it all synchronized like the main project visualization

georgegu1997 commented 10 months ago

Thanks for the question!

This is probably due to the difference in the coordinate system between Unity/Ai2Thor and Open3D. The following function may be helpful.

def adjust_ai2thor_pose(pose): 
    '''
    Adjust the camera pose from the one used in Unity to that in Open3D.
    '''
    # Transformation matrix to flip Y-axis
    flip_y = np.array([
        [1, 0, 0, 0],
        [0, -1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1]
    ])

    # Separate rotation and translation
    rotation = pose[:3, :3]
    translation = pose[:3, 3]

    # Adjust rotation and translation separately
    adjusted_rotation = flip_y[:3, :3] @ rotation @ flip_y[:3, :3]
    adjusted_translation = flip_y[:3, :3] @ translation

    # Reconstruct the adjusted camera pose
    adjusted_pose = np.eye(4)
    adjusted_pose[:3, :3] = adjusted_rotation
    adjusted_pose[:3, 3] = adjusted_translation

    R = Rotation.from_euler('x', 180, degrees=True).as_matrix()
    R_homogeneous = np.eye(4)
    R_homogeneous[:3, :3] = R

    T_open3d_rotated = R_homogeneous @ adjusted_pose

    adjusted_pose = T_open3d_rotated

    return adjusted_pose
ibrahimyousri commented 10 months ago

Hello @georgegu1997 Thanks for your prompt response! I flipped the scene (for better visualization) in datasets_common.py and applied your function on the camera_pose variable in animate_mapping_save.py

camera_pose = frame["camera_pose"]

to

newpose=adjust_ai2thor_pose(frame["camera_pose"])
 camera_pose = newpose

Your function gives me a better visualization (https://drive.google.com/file/d/1QmhtjCRKzTqOQiGRkXx-C0aCY9fbvSji/view?usp=sharing) but still not synchronized and I need to explore the scene by mouse

This my data loader code in datasets_common.py

class ai2thor(GradSLAMDataset):
    def __init__(
        self,
        config_dict,
        basedir,
        sequence,
        stride: Optional[int] = None,
        start: Optional[int] = 0,
        end: Optional[int] = -1,
        desired_height: Optional[int] = 1000,
        desired_width: Optional[int] = 1000,
        load_embeddings: Optional[bool] = False,
        embedding_dir: Optional[str] = "embeddings",
        embedding_dim: Optional[int] = 512,
        **kwargs,
    ):
        self.input_folder = os.path.join(basedir, sequence)
        self.pose_path = os.path.join(self.input_folder, "traj.txt")
        super().__init__(
            config_dict,
            stride=stride,
            start=start,
            end=end,
            desired_height=desired_height,
            desired_width=desired_width,
            load_embeddings=load_embeddings,
            embedding_dir=embedding_dir,
            embedding_dim=embedding_dim,
            **kwargs,
        )

    def get_filepaths(self):
        color_paths = natsorted(glob.glob(f"{self.input_folder}/results/frame*.jpg"))
        depth_paths = natsorted(glob.glob(f"{self.input_folder}/results/depth*.png"))
        embedding_paths = None
        if self.load_embeddings:
            embedding_paths = natsorted(
                glob.glob(f"{self.input_folder}/{self.embedding_dir}/*.pt")
            )
        return color_paths, depth_paths, embedding_paths

    def load_poses(self):
        poses = []
        with open(self.pose_path, "r") as f:
            lines = f.readlines()
        for i in range(self.num_imgs):
            line = lines[i]
            c2w = np.array(list(map(float, line.split()))).reshape(4, 4)
            c2w[:3, 1] *= -1
            #c2w[:3, 2] *= -1
            c2w = torch.from_numpy(c2w).float()
            poses.append(c2w)
        return poses
    def read_embedding_from_file(self, embedding_file_path):
        embedding = torch.load(embedding_file_path)
        return embedding.permute(0, 2, 3, 1)  # (1, H, W, embedding_dim)
georgegu1997 commented 9 months ago

According to the video, seems the chirality of the point cloud is mismatched with the image - the orange couch is on the left side of the TV in the images but on the right side in the point clouds.

I think this is probably because you haven't applied the adjust_ai2thor_pose transformation consistently in different scripts. Please check whether you should also use this function in the object-based mapping or other scripts.

ibrahimyousri commented 9 months ago

Thank you @georgegu1997 I manged to fix it