EPFL-VILAB / omnidata

A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]
Other
395 stars 49 forks source link

Inverse in Midas Loss #50

Open jamekuma opened 1 year ago

jamekuma commented 1 year ago

In the Midas loss, why calculate the inversion of the prediction and target before the reg_loss? I didn't find a corresponding explanation in the Midas paper.

class MidasLoss(nn.Module):
    def __init__(self, alpha=0.1, scales=4, reduction='image-based'):
        super().__init__()

        self.__ssi_mae_loss = SSIMAE()
        self.__gradient_matching_term = GradientMatchingTerm(scales=scales, reduction=reduction)
        self.__alpha = alpha
        self.__prediction_ssi = None

    def forward(self, prediction, target, mask):
        prediction_inverse = 1 / (prediction.squeeze(1)+1e-6)
        target_inverse = 1 / (target.squeeze(1)+1e-6)
        ssi_loss = self.__ssi_mae_loss(prediction, target, mask)

        scale, shift = compute_scale_and_shift(prediction_inverse, target_inverse, mask.squeeze(1))
        self.__prediction_ssi = scale.view(-1, 1, 1) * prediction_inverse + shift.view(-1, 1, 1)
        reg_loss = self.__gradient_matching_term(self.__prediction_ssi, target_inverse, mask.squeeze(1))
        if self.__alpha > 0:
            total = ssi_loss + self.__alpha * reg_loss

        return total, ssi_loss, reg_loss
elenacliu commented 1 year ago

MiDas's output is inverse depth, so I guess Omnidata just predicts a non-inverse depth? (z-buffer depth)

jamekuma commented 1 year ago

MiDas's output is inverse depth, so I guess Omnidata just predicts a non-inverse depth? (z-buffer depth)

However, in ssi loss, there are no inversion, while the reg loss has the inversion. This issue confused me totally.