facebookresearch / frankmocap

A Strong and Easy-to-use Single View 3D Hand+Body Pose Estimator
Other
2.14k stars 374 forks source link

The loss of hand axis angle pose will make the effect worse #132

Closed youngstu closed 3 years ago

youngstu commented 3 years ago

I reproduced the hand training module and found that the loss of hand axis angle pose may make the effect worse. The data verification is correct. After the loss of axis angle is added, the hand often turns forward and backward.

import torch
import torch.nn as nn

class ManoLoss:
    def __init__(
            self,
            lambda_pose=100.0,
            lambda_shape=100.0,
            lambda_joint3d=1.0,
            lambda_kp2d=1.0,
    ):
        self.lambda_pose = lambda_pose
        self.lambda_shape = lambda_shape
        self.lambda_joint3d = lambda_joint3d
        self.lambda_kp2d = lambda_kp2d

        self.criterion_pose = nn.MSELoss().cuda()
        self.criterion_shape = nn.MSELoss().cuda()
        self.criterion_joint3d = nn.MSELoss().cuda()
        self.criterion_kp2d = nn.MSELoss().cuda()

    def compute_loss(self, preds, targs, infos):

        inp_res = infos['inp_res']
        root_id = infos['root_id']
        batch_size = infos['batch_size']
        flag = targs['flag_3d']
        batch_3d_size = flag.sum()

        flag = flag.bool()

        total_loss = torch.Tensor([0]).cuda()
        mano_losses = {}

        gt_pose = targs['pose']
        gt_shape = targs['shape'].float()
        gt_kp2d = targs['kp2d'].float()
        gt_joint3d = targs['joint'] * 1000.0
        gt_joint3d = gt_joint3d - gt_joint3d[:, root_id:root_id+1, :]

        for idx, pred in enumerate(preds):

            pred_pose = pred['pose']
            pred_shape = pred['shape']
            pred_kp2d = pred['kp2d']
            pred_joint3d = pred['joint']
            pred_joint3d = pred_joint3d - pred_joint3d[:, root_id:root_id + 1, :]

            total_loss = torch.Tensor([0]).cuda()
            if self.lambda_pose:
                pose_loss = self.criterion_pose(pred_pose, gt_pose) * self.lambda_pose
                mano_losses['pose_%d' % idx] = pose_loss
                total_loss += pose_loss

            if self.lambda_shape:
                #shape_loss = self.criterion_pose(pred_shape, gt_shape) * self.lambda_shape
                shape_loss = self.criterion_pose(pred_shape, torch.zeros_like(pred_shape)) * self.lambda_shape
                mano_losses['shape_%d' % idx] = shape_loss
                total_loss += shape_loss

            if self.lambda_joint3d:
                joint3d_loss = self.criterion_pose(pred_joint3d, gt_joint3d) * self.lambda_joint3d
                mano_losses['joint3d_%d' % idx] = joint3d_loss
                total_loss += joint3d_loss

            if self.lambda_kp2d:
                kp2d_loss = self.criterion_pose(pred_kp2d, gt_kp2d) * self.lambda_kp2d
                mano_losses['kp2d_%d' % idx] = kp2d_loss
                total_loss += kp2d_loss

        mano_losses["total"] = total_loss

        return total_loss, mano_losses, batch_3d_size
jhugestar commented 3 years ago

Hey, thanks for your interest. It should be very dependent to how you define your angle-axis representation during pre-processing. I would suggest you to double check whether the angles are consistently defined. It's very unlikely that adding angle-axis loss significantly breaks the overall performance (it would change the performance a bit, though). Think about some corner cases - for example, 360 degree is the same as -360 degree. The issue is quite generic, and may not be specific to hand pose estimation.

youngstu commented 3 years ago

I used the Freihand dataset and did not modify the original pose. At the same time, i closed the data augmentation.

penincillin commented 3 years ago

@youngstu Thanks for your interests in our work. But it turns our our issue section is overwhelmed by your issues, which makes other people hard to use. If you don't mind, please send emails to me (you should be able to find my email address at my github home-page), I will try to answer your questions.

lvZic commented 1 year ago

@youngstu i met the same problem, same for freihand dataset. I found that the diversity of images' focal length may lead to it. But i still found no solution to make training converge better. I doubt if there is needed some tricks in training. I wanna konw have u fixed it yet?