ctom2 / colie

[ECCV 2024] This is the official code for the paper "Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations"
Apache License 2.0
31 stars 1 forks source link

Why Minimizing the Output of the Residual Network? #1

Closed WillCheung2016 closed 3 months ago

WillCheung2016 commented 3 months ago

Hi,

Thanks for open-sourcing the code. I notice a problem in the paper and cannot understand. Please enlighten me.

At this line, illu_lr = illu_res_lr + img_v_lr. When computing the fidelity loss at this line, the term illu_lr - img_v_lr simply equals illu_res_lr. Therefore, the fidelity loss simply minimizes the output of the network. This term seems to be more like a regularization loss than a supervised loss as it doesn't use any label information. I think the fidelity loss is supposed to be a supervised loss using model outputs and the V component of the image. Could you please correct me if my understanding is wrong?

ctom2 commented 3 months ago

Thank you for your interest in our work. Our model does not rely on supervised label information. Instead it is trained directly using the input image and by leveraging solely the image's inherent information through zero-shot optimization guided by assumptions from Retinex theory.

Therefore, for each input image the weights of the network need to be updated to fit the needs of the specific image.

Please see more details about the loss function composition and the details here.

WillCheung2016 commented 3 months ago

Thanks for replying. As far as I understand the paper with the code at this line, this mean square loss should use the V component of the HSV image img_v_lr as the supervision signal.

loss_spa = torch.mean(torch.abs(torch.pow(illu_lr - img_v_lr, 2)))

My question is, if you define illu_lr = illu_res_lr + img_v_lr, then the supervision signals in the mean square loss function would be nullified, which is equivalent to minimizing the network's output illu_res_lr.