StaRainJ / ResNetFusion

The code of "Infrared and visible image fusion via detail preserving adversarial learning"
36 stars 19 forks source link

Hello, your code helped me a lot, but I still have doubts about TV_loss, Image_loss and LaplacianLoss. #3

Open YannikYang1 opened 2 years ago

YannikYang1 commented 2 years ago

Hi, I understand that this code is about the paper "Infrared and visible image fusion via detail preserving adversarial learning", but about the loss function part loss.py, I have studied carefully for a long time and still have questions, I hope you can help me answer them, thanks a lot.

  1. tv_loss in loss.py in your code, I understand that this is a loss function for image denoising, but your class TVLoss(nn.Module) is different from the general TVloss in the return value, the general TVLoss returns self.TVLoss_weight*2*(h_tv/ count_h + w_tv/count_w)/batch_size, but what you return here is self.tv_loss_weight * 2 * (h_tv[:, :, :h_x - 1, :w_x - 1] + w_tv[:, :, :h_x - 1, :w_x - 1]), what does this do? and is tv_loss reflected in the original text? Because I didn't see tv_loss in the original text.
  2. In your class LaplacianLoss(nn.Module), where the last return value is self.laplacian_filter(x) ** 2, the paper about Target edge-enhancement loss, written to calculate the gradient, but why you return the square of the gradient value here?
  3. In the paper of Target edge-enhancement loss, about the edge enhancement loss of G(x,y), the code is coefficient = pyramid_addition * alpha/2 + 1, my understanding: pyramid_addition is G(x,y), But why multiply it by alpha/2 and + 1?

I hope you can help me with these questions in your free time. Thank you and I wish you all the best! Yours sincerely Yannik Yang.

youweixiansheng commented 2 years ago
  1. I think it is a computational trick where 'h_tv' and 'w_tv' will keep the same shape.
youweixiansheng commented 2 years ago
  1. I am not sure whether the negative values are valid for Gaussian blur or not. Since the gradient map corresponds to the image edge map, I think it might be better when all values are positive.
youweixiansheng commented 2 years ago

pyramid_addition * alpha/2 + 1

  1. There is an implicit broadcast operator. However, I think that pyramid_addition * alpha + 1 is enough
YannikYang1 commented 2 years ago
  1. I think it is a computational trick where 'h_tv' and 'w_tv' will keep the same shape.

thank you so much for your reply, I'm full of gratitude, and especially, I want to consult you about the tv_loss again, because there is no tv_loss in the anthor's paper. but in this code of his paper, tv_loss occured. I still don't understand the tv_loss in the code. Just now, you said it could be designed to keep the same shape, but in the code, h_tv[:, :, :h_x - 1, :w_x - 1] and w_tv[:, :, :h_x - 1, :w_x - 1] is already the same shape. and tv_loss = self.mse_loss(self.tv_loss(out_images) , (self.tv_loss(target_images) + self.tv_loss(target_ir))), what the shape they want to keep? I still don't know the effect in the code and if it is redundant about tv_loss in the code? thank you again for your reply, I hope you all the best. yours sincerely Yannik.

youweixiansheng commented 2 years ago

tv_loss = self.mse_loss(self.tv_loss(out_images) , (self.tv_loss(target_images) + self.tv_loss(target_ir))) where self.tv_loss is to compute the sum of the x- and y-axis image gradients. tv_loss is designed to keep the gradients of fused image consistent with the gradients of ir and vis images. I think that the tv_loss was called as L_{gradient} in the paper.

YannikYang1 commented 2 years ago

tv_loss = self.mse_loss(self.tv_loss(out_images) , (self.tv_loss(target_images) + self.tv_loss(target_ir))) where self.tv_loss is to compute the sum of the x- and y-axis image gradients. tv_loss is designed to keep the gradients of fused image consistent with the gradients of ir and vis images. I think that the tv_loss was called as L_{gradient} in the paper.

Thank you very much for your answer, I seem to understand part of it, but I found that there is no gradient operator in the calculation of tv_loss, This code is used to compute the difference between the intensity of adjacent pixels in the x- and y- directions of the image, and I can only see the intensity information calculation that reflects the contrast of the image, and there is no gradient calculation that reflects the texture details. Even the assumption of gradient calculation, according to the original text L{gradient} = (Dv-Df)^2, then the code should betv_loss = self.mse_loss(self.tv_loss(out_images) , self.tv_loss(target_images)), why it keep the fused image consistent with the sum of ir and vis images about difference between adjacent pixels? I sincerely thank you for answering my question, which has bothered me for a long time.

youweixiansheng commented 2 years ago
h_tv = torch.pow((x[:, :, 1:, :] - x[:, :, :h_x - 1, :]), 2)  ---> h_tv (B×C×H-1×W)
w_tv = torch.pow((x[:, :, :, 1:] - x[:, :, :, :w_x - 1]), 2)  ---> w_tv (B×C×H×W-1)

image gradient

youweixiansheng commented 2 years ago

tv_loss = self.mse_loss(self.tv_loss(out_images) , self.tv_loss(target_images)) I agree with you.

YannikYang1 commented 2 years ago
h_tv = torch.pow((x[:, :, 1:, :] - x[:, :, :h_x - 1, :]), 2)  ---> h_tv (B×C×H-1×W)
w_tv = torch.pow((x[:, :, :, 1:] - x[:, :, :, :w_x - 1]), 2)  ---> w_tv (B×C×H×W-1)

image gradient

Thanks for your answer so much, until today I finally figured out what tv_loss means. But is the square of the intensity difference of adjacent pixels the gradient? and why is the gradient generally calculated by the gradient operator convolution kernel instead of the square of the difference of adjacent pixels?