facebookresearch / TemporallyConsistentDepth

Code for our CVPR 2023 paper on online, temporally consistent depth estimation.
Other
85 stars 3 forks source link

About the metrics and reproducing results #11

Open TomTomTommi opened 1 month ago

TomTomTommi commented 1 month ago

Hi, very interesting work!

I wonder if there is a command to reproduce Table 3's results (RAFTStereo) (not the visualization results). In other words, how to use metrics.py for evaluation?

For OPW, the equation in the original paper should be divided by the number of pixels m, right? Just the same as RTC.

Plus, in the implementation of TCC and TCM, you use two different SSIM functions. What is the difference of these two functions and why not use a unified ssim?

    def TCC(self, d0, d1, gt0, gt1, mask=None):
        if mask == None:
            mask = torch.ones_like(d0).to(d0.get_device())

        ssimloss = SSIM(1.0, nonnegative_ssim=True)
        return  ssimloss( (torch.abs(d1 - d0) * mask.float()).expand(-1, 3, -1, -1),
                          (torch.abs(gt1 - gt0) * mask.float()).expand(-1, 3, -1, -1) )

    def TCM(self, d0, d1, gt0, gt1, mask=None):
        if mask == None:
            mask = torch.ones_like(d0).to(d0.get_device())

        b, _, h, w = d0.shape
        ssimloss = SSIM(1.0, nonnegative_ssim=True, size_average=False)

        dmax = torch.max(gt0.view(b, -1), -1)[0].view(b, 1, 1, 1).expand(-1, 3, -1, -1)
        dmin = torch.min(gt0.view(b, -1), -1)[0].view(b, 1, 1, 1).expand(-1, 3, -1, -1)

        d0_ = (d0.expand(-1, 3, -1, -1).to(self.device) - dmin) / (dmax - dmin) * 255.
        d1_ = (d1.expand(-1, 3, -1, -1).to(self.device) - dmin) / (dmax - dmin) * 255.
        flow = self.oflow( d0_, d1_ )

        gt0_ = (gt0.expand(-1, 3, -1, -1).to(self.device) - dmin) / (dmax - dmin) * 255.
        gt1_ = (gt1.expand(-1, 3, -1, -1).to(self.device) - dmin) / (dmax - dmin) * 255.
        flow_gt = self.oflow( gt0_, gt1_ )
        flow_mask = torch.sum(flow > self.flow_limit, 1, keepdim=True) == 0

        mask = torch.logical_and(flow_mask, mask)

        ssim = torch.mean(ssimloss( torch.cat( (flow, torch.ones_like(flow[:, 0, None, ...])), 1) * mask.expand(-1, 3, -1, -1),
                                    torch.cat( (flow_gt, torch.ones_like(flow[:, 0, None, ...])), 1) * mask.expand(-1, 3, -1, -1) )[:, :2])
        return ssim
TomTomTommi commented 1 month ago

Following your code, I use RAFTStereo (sceneflow.pth, iter=32) to evaluate on the Sintel Final dataset and get the following results. I check the details and confirm that I convert the disparity to depth before calculating the metrics. Plus, the optical flow is calculated by RAFT (raft-things.pth, iter=20)

disp_OPW                  2.14604
disp_OPW_30               0.601138
disp_RTC                  0.68798
disp_RTC_30               0.673705
disp_TCC                  0.643846
disp_TCM                  0.69233

It seems only the TCC part is closed to the original results. The other three parts with optical flow are not the same.