geopavlakos / hamer

HaMeR: Reconstructing Hands in 3D with Transformers
https://geopavlakos.github.io/hamer/
MIT License
326 stars 28 forks source link

Transform Pyrender to Pytorch3d #22

Closed haonanhe closed 6 months ago

haonanhe commented 6 months ago

Hi, thank you for this amazing work!

I am trying to use pytorch3d renderer instead of pyrender renderer to render the hand mesh in hamer/hamer/utils/renderer.py.call(). The codes are shown below. However, the hand mesh can not be rendered into the image plane:

image

Do you have any suggestions? I think I have set the camera parameters the same as the original setting.

'

def__call__(
    self,
    vertices: np.array,
    camera_translation: np.array,
    image: torch.Tensor,
    full_frame: bool = False,
    imgname: Optional[str] = None,
    side_view: bool = False, rot_angle=90,
    mesh_base_color = (1.0, 1.0, 0.9),
    scene_bg_color = (0, 0, 0),
    return_rgba: bool = False,
):
    # Preprocess the image
    if full_frame:
        image = cv2.imread(imgname).astype(np.float32)[:, :, ::-1] / 255.
    else:
        image = image.clone() * torch.tensor(self.cfg.MODEL.IMAGE_STD, device=image.device).reshape(3,1,1)
        image = image + torch.tensor(self.cfg.MODEL.IMAGE_MEAN, device=image.device).reshape(3,1,1)
        image = image.permute(1, 2, 0).cpu().numpy()

    vertices = torch.tensor(vertices).cuda()
    faces = torch.tensor(self.faces.copy()).cuda()
    mesh_base_color = torch.tensor(mesh_base_color).view(1, 1, 3).cuda()
    textures = TexturesVertex(verts_features=torch.ones_like(vertices).unsqueeze(0) * mesh_base_color)
    mesh = Meshes(verts=[vertices], faces=[faces], textures=textures)

    # neglect mesh rotation
    rotate_transform = RotateAxisAngle(angle=180, axis="X").cuda()
    mesh = mesh.update_padded(rotate_transform.transform_points(mesh.verts_padded()))

    # original camera pose from pyrender
    camera_translation[0] *= -1.
    camera_pose = np.eye(4)
    camera_pose[:3, 3] = camera_translation

    # transform to right hand coordinate
    z_flip = np.array([
        [-1, 0,  0, 0],
        [0, 1,  0, 0],
        [0, 0, -1, 0],
        [0, 0,  0, 1]
    ])

    camera_pose = z_flip @ camera_pose

    R = torch.from_numpy(camera_pose[:3, :3]).unsqueeze(0).cuda()
    T = torch.from_numpy(camera_pose[:3, 3]).unsqueeze(0).cuda()

    # Cameras
    cameras = PerspectiveCameras(device="cuda", focal_length=((self.focal_length, self.focal_length),), principal_point=((image.shape[0] / 2., image.shape[1] / 2.),), R=R, T=T)

    # Rasterization settings
    raster_settings = RasterizationSettings(
        image_size=(image.shape[0], image.shape[1]),
        blur_radius=0.0,
        faces_per_pixel=1,
    )

    # lights 
    lights = PointLights(device='cuda', location=[[0.0, 0.0, 3.0]])

    # backfround
    blend_params = BlendParams(background_color=(0, 0, 0)) 

    # renderer
    renderer = MeshRenderer(
        rasterizer=MeshRasterizer(
            cameras=cameras,
            raster_settings=raster_settings
        ),
        shader=SoftPhongShader(
            device='cuda',
            cameras=cameras,
            lights=lights,
            blend_params=blend_params
        )
    )

    rendered_images = renderer(mesh)

    rendered_images = rendered_images[0].detach().cpu().numpy()
    rendered_images = rendered_images.astype(np.float32) / 255.0

    valid_mask = (rendered_images[:, :, -1])[:, :, np.newaxis]
    if not side_view:
        output_img = (rendered_images[:, :, :3] * valid_mask + (1 - valid_mask) * image)
    else:
        output_img = rendered_images[:, :, :3]

    return output_img`
geopavlakos commented 6 months ago

I haven't used pytorch3d in the past, but I wouldn't be surprised if this happens because of different conventions for the camera coordinate system between pyrender and pytorch3d. If you are able to make rendering with pytorch3d work, please feel free to post your solution for others to see.