ZhenglinZhou / STAR

[CVPR 2023] STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection
157 stars 17 forks source link

star loss question #7

Closed chg0901 closed 1 year ago

chg0901 commented 1 year ago

Hello, Zhenglin. I am trying to reimplement this star loss in mmpose. However, I have a question, could you please explain the functions of following two lines?

        normal_dist = self.dist_func(normal_error, torch.zeros_like(normal_error).to(normal_error), reduction='none')
        tangent_dist = self.dist_func(tangent_error, torch.zeros_like(tangent_error).to(tangent_error), reduction='none')

From my understanding, the default value of self.dist_func is SmoothL1Loss(). But I am not sure what is the value of normal_dist and tangent_dist and the meaning of them.

maybe it is not easy to explain, if so, may I get your wechat or email? thanks a lot! Best regard!

ZhenglinZhou commented 1 year ago

Hi @chg0901, thanks for your interest!

The normal_error and tangent_error are the projection of the prediction error y_t - miu according to the ambiguity direction, which are orthogonal. More details can be found in 4.2 STAR Loss in our paper (https://arxiv.org/pdf/2306.02763v1.pdf).

If you have any other question, feel free to email me (zhouzhenglincs@gmail.com). :)

chg0901 commented 1 year ago

Thank you for your kind reply. I have another question if you don't mind I ask you in here. And I will also send you an email.

    def ambiguity_guided_decompose(self, error, evalues, evectors):
        bs, npoints = error.shape[:2] 
        normal_vector = evectors[:, :, 0]  # bs, npoints,2
        tangent_vector = evectors[:, :, 1] # bs, npoints,2
        normal_error = torch.matmul(normal_vector.unsqueeze(-2), error.unsqueeze(-1))  # bs, npoints,2
        tangent_error = torch.matmul(tangent_vector.unsqueeze(-2), error.unsqueeze(-1))  # bs, npoints,2
        normal_error = normal_error.squeeze(dim=-1)  # bs, npoints,2
        tangent_error = tangent_error.squeeze(dim=-1)  # bs, npoints,2

        normal_dist = self.dist_func(normal_error, torch.zeros_like(normal_error).to(normal_error), reduction='none')  # bs, npoints,1
        tangent_dist = self.dist_func(tangent_error, torch.zeros_like(tangent_error).to(tangent_error), reduction='none') # bs, npoints,1

        normal_dist = normal_dist.reshape(bs, npoints, 1) # bs, npoints,1
        tangent_dist = tangent_dist.reshape(bs, npoints, 1) # bs, npoints,1
        dist = torch.cat((normal_dist, tangent_dist), dim=-1)  # bs, npoints,2
        scale_dist = dist / torch.sqrt(evalues + self.EPSILON)  # bs, npoints,2
        scale_dist = scale_dist.sum(-1) # bs, npoints,1
        return scale_dist

In Function ambiguity_guided_decompose, you do reshape for normal_dist and tangent_dist, this step means that the shape of them should be [bs, npoints] or [1,bs, npoints] or [ npoints,bs], or [ npoints,bs,1] however, when you call self.dist_func, which is a kind of Loss function such as SmoothL1Loss or WingLoss, it (self.dist_func) should make the value of tangent_dist or tangent_dist as a scalar value. This will make the rest codes not working after I checked the function SmoothL1Loss and WingLoss you write.

Actually, there are 4 self.dist_func to config. however, your WingLoss() is without the reductionparameter (l1_loss and mse_lossuse the function of pytorch, smoothL1loss with reduction patameter)

    def __init__(self, w=1, dist='smoothl1', num_dim_image=2, EPSILON=1e-5):
        super(STARLoss_v2, self).__init__()
        self.w = w
        self.num_dim_image = num_dim_image
        self.EPSILON = EPSILON
        self.dist = dist
        if self.dist == 'smoothl1':
            self.dist_func = SmoothL1Loss()
        elif self.dist == 'l1':
            self.dist_func = F.l1_loss
        elif self.dist == 'l2':
            self.dist_func = F.mse_loss
        elif self.dist == 'wing':
            self.dist_func = WingLoss()
        else:
            raise NotImplementedError

I am not sure if my understanding is right. But I add comments to show all outshape of each line by infer or guess. If my understanding is not correct, please let me know.

This reimplement is almost done, when I checked it is right, I will show you a link for better usage of this good research!

ZhenglinZhou commented 1 year ago

Hi @chg0901, thanks for your kind suggestion!

In dist_func, we set reduction='none' to make sure the shape of output is as same as the input, which is a [bs, n, 1] matrix, indicating the distance of the error projection.

The reduction parameter can be used in l1, l2 and smoothl1, which seems to be missed in WingLoss(). Thanks! Do you want to open a PR to fix it?

I look forward to your reimplement in mmPose. If you have any question, please let me know.

ZhenglinZhou commented 1 year ago

Will close this issue. If you have any other question, feel free to reopen this issue or drop me an email.