andreas128 / SRFlow

Official SRFlow training code: Super-Resolution using Normalizing Flow in PyTorch
Other
831 stars 111 forks source link

Critical bug in LPIPS implementation. #43

Closed sieu-n closed 3 years ago

sieu-n commented 3 years ago

@andreas128 @hwong557 The LPIPS library is intended to be used in a different order as your implementation. I believe there is an error in your LPIPS implementation.

Instead of

d = loss_fn_alex(img_super_resolution, img_ground_truth) # normalized to [-1, 1]

You must use

d = loss_fn_alex(img_ground_truth, img_super_resolution) # normalized to [-1, 1]

According to an example snippet in test_network.py from the LPIPS Github repository,

## Example usage with images
ex_ref = lpips.im2tensor(lpips.load_image('./imgs/ex_ref.png'))
ex_p0 = lpips.im2tensor(lpips.load_image('./imgs/ex_p0.png'))
ex_p1 = lpips.im2tensor(lpips.load_image('./imgs/ex_p1.png'))
if(use_gpu):
    ex_ref = ex_ref.cuda()
    ex_p0 = ex_p0.cuda()
    ex_p1 = ex_p1.cuda()

ex_d0 = loss_fn.forward(ex_ref,ex_p0)
ex_d1 = loss_fn.forward(ex_ref,ex_p1)

ex_ref is the true image and ex_p0 and ex_p1 are the defective images. The ground truth image comes first before the defected image.

In your Test.py and Measure.py, the LPIPS score is implemented in the wrong order.

#Test.py
meas['PSNR'], meas['SSIM'], meas['LPIPS'] = measure.measure(sr, hr)

#Measure.py
class Measure():
    def __init__(self, net='alex', use_gpu=False):
        self.device = 'cuda' if use_gpu else 'cpu'
        self.model = lpips.LPIPS(net=net)
        self.model.to(self.device)

    def measure(self, imgA, imgB):
        return [float(f(imgA, imgB)) for f in [self.psnr, self.ssim, self.lpips]]

    def lpips(self, imgA, imgB, model=None):
        tA = t(imgA).to(self.device)
        tB = t(imgB).to(self.device)
        dist01 = self.model.forward(tA, tB).item()
        return dist01
     ...

This can also pose serious impacts on the LPIPS score you reported in your paper.

sieu-n commented 3 years ago

We discovered this issue because we found very different LPIPS scores on the ESRGAN. In our tests, the official ESRGAN model achieves an LPIPS score of ~0.9

sieu-n commented 3 years ago

The LPIPS doesn't require the two inputs to be explicit. Sorry for false claim