eth-ait / 4d-dress

Official repository for CVPR 2024 highlight paper 4D-DRESS: A 4D Dataset of Real-world Human Clothing with Semantic Annotations.
https://eth-ait.github.io/4d-dress/
88 stars 2 forks source link

Problem with cameras. Offset is None #3

Open Andreus00 opened 5 months ago

Andreus00 commented 5 months ago

Hi,

First of all, thank you for your work!!!

I am trying to load camera parameters from the cameras.pkl file, and the mesh as a point cloud to use with Gaussian Splatting.

To do so, I wrote this: `

def read4dDressInfo(cfg) -> SceneInfo:

# open info from basic_info.pkl
basic_info = open_pickle(cfg.source_path, "basic_info.pkl")
scan_frames = basic_info['scan_frames']
scan_rot = basic_info['rotation']

offset = basic_info['offset']

# load scan mesh
_, scan_mesh, _, _ = load_scan_mesh(cfg.obj_path, rotation=scan_rot, offset=offset)

# load cameras from cameras.pkl
cameras = open_pickle(cfg.source_path, "Capture", "cameras.pkl")

# For each camera, read image, mask and camera.
cam_infos = []

# For each camera, create a CameraInfo object
for key, camera in cameras.items():

    camera_intrinsics = camera['intrinsics']
    camera_extrinsics = camera['extrinsics']

    R = camera_extrinsics[:3, :3]
    R = np.transpose(R)
    T = camera_extrinsics[:3, 3]

    image_path = os.path.join(cfg.source_path, "Capture", key, "images", f"capture-f00011.png")
    image = open_image(image_path)
    mask_path = os.path.join(cfg.source_path, "Capture", key, "masks", f"mask-f{scan_frames[0]}.png")
    mask = open_image(mask_path)

    f_x = camera_intrinsics[0, 0]
    f_y = camera_intrinsics[1, 1]
    width = image.size[0]
    height = image.size[1]

    fov_x = 2 * np.arctan(width / (2 * f_x))
    fov_y = 2 * np.arctan(height / (2 * f_y))

    alpha_image = Image.new("RGBA", image.size, (255, 255, 255, 255))
    alpha_image.paste(image, (0, 0), mask)
    print(np.asarray(alpha_image).shape)

    cam_info = CameraInfo(
        uid=key,
        R=R,
        T=T,
        FovY=fov_y,
        FovX=fov_x,
        image=alpha_image,
        image_path=image_path,
        image_name=f"side-{key}-capture-f{scan_frames[0]}",
        width=width,
        height=height
    )
    cam_infos.append(cam_info)

# get nerf normalization
nerf_normalization = getNerfppNorm(cam_infos)

# create a point cloud from the mesh. This simply takes vertices and normals from the mesh and packs them into a BasicPointCloud object. The colors of Gaussians are set to normals.
scales, opacity, pcd = mesh_to_pointcloud(scan_mesh)

images_dir = os.path.join(cfg.source_path,cfg.subj,cfg.outfit,cfg.seq)
ply_path = os.path.join(images_dir, "points3d.ply")

return scales, opacity, SceneInfo(
    point_cloud=pcd,
    train_cameras=cam_infos,
    test_cameras=[],
    nerf_normalization=nerf_normalization,
    ply_path=ply_path
)

`

However, I am having a problem as the point cloud (created from the mesh) and the image are misaligned. I plotted both the image and the render from GS, and these are some examples of the misalignment: immagine immagine

If I remove the rotation from the mesh, both the subjects look in the same direction, but they remain misaligned: immagine immagine

Am I missing something?

Thank you.

p.s. I noticed that basic_info['offset'] is always None for the sample that I am using (0112 - Inner - Take2). Maybe that's the problem?

Andreus00 commented 5 months ago

Update: By setting mcentral to False, the alignment improves a lot, but it still seems like all the cameras are a little bit to the left: immagine immagine

azuxmioy commented 5 months ago

Hi all, thanks for the message.

I might need some time to investigate this issue due to the upcoming CVPR conference.

At the same time, if @WenbWa has any ideas, please feel free to comment.

Thanks.

azuxmioy commented 5 months ago

btw, what rendering pipeline are you using?

have you tried our demo code here? https://github.com/eth-ait/4d-dress/blob/baf3e8f0857f7b22996512ba82a55c9530f268ce/dataset/extract_garment.py

Andreus00 commented 5 months ago

btw, what rendering pipeline are you using?

I am using Gaussian Splatting's rendering pipeline

have you tried our demo code here? https://github.com/eth-ait/4d-dress/blob/baf3e8f0857f7b22996512ba82a55c9530f268ce/dataset/extract_garment.py

I used it to load my cameras during my tests, but it did not work. I removed lines 23 and 25 to make it work with GS's cameras and used with/height instead of p_x/p_y as those were not working, but the remaining lines are based on those that you linked.

p.s. I bypassed the problem by creating my custom cameras and creating synthesized images from the mesh. However, understanding how to align images to the mesh is something that I will probably need later.

WenbWa commented 4 months ago

Hi, thanks for being interested in our 4D-DRESS dataset.

The cameras.pkl files are used in our work mostly for the Pytorch3D cameras. Here is the Pytorch3D camera loading function used in dataset/extract_garment.py:

# load pytorch3d cameras from parameters: intrinsics, extrinsics
def load_pytorch_cameras(camera_params, camera_list, image_shape):
    # init camera_dict
    camera_dict = dict()
    # process all camera within camera_list
    for camera_id in camera_list:
        # assign camera intrinsic and extrinsic matrices
        intrinsic = torch.tensor((camera_params[camera_id]["intrinsics"]), dtype=torch.float32).cuda()
        extrinsic = torch.tensor(camera_params[camera_id]["extrinsics"], dtype=torch.float32).cuda()
        # assign camera image size
        image_size = torch.tensor([image_shape[0], image_shape[1]], dtype=torch.float32).unsqueeze(0).cuda()

        # assign camera parameters
        f_xy = torch.cat([intrinsic[0:1, 0], intrinsic[1:2, 1]], dim=0).unsqueeze(0)
        p_xy = intrinsic[:2, 2].unsqueeze(0)
        R = extrinsic[:, :3].unsqueeze(0)
        T = extrinsic[:, 3].unsqueeze(0)
        # coordinate system adaption to PyTorch3D
        R[:, :2, :] *= -1.0
        # camera position in world space -> world position in camera space
        T[:, :2] *= -1.0
        R = torch.transpose(R, 1, 2)  # row-major
        # assign Pytorch3d PerspectiveCameras
        camera_dict[camera_id] = PerspectiveCameras(focal_length=f_xy, principal_point=p_xy, R=R, T=T, in_ndc=False, image_size=image_size).cuda()
    # assign Pytorch3d RasterizationSettings
    raster_settings = RasterizationSettings(image_size=image_shape, blur_radius=0.0, faces_per_pixel=1, max_faces_per_bin=80000)
    return camera_dict, raster_settings

Maybe the camera parameters used in Gaussian Splatting are slightly different from the Pytorch3D ones.