IIGROUP / MANIQA

[CVPRW 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
Apache License 2.0
307 stars 36 forks source link

Hi! A small question about the weight score. Looks weired. #36

Closed lllllllllllll-llll closed 1 year ago

lllllllllllll-llll commented 1 year ago

The authors divided the input whole image to NxN small patches, and use one branche to calculate the weight score of each patch. So, I was wondering that the score of each small patch of the whole image is the same, like all of them is 1.4 score, how would the network learn the different scores of different patches? I mean the initial score of each patch is the same, why the network be able to learn the different score of these patches with different scores?

TianheWu commented 1 year ago

Hi, it is a good question. The whole process is a learning method. We can not control the inherent parameters. But the assumption about "the score of each small patch of the whole image is the same" is incorrect, this phenomenon will not appear.

lllllllllllll-llll commented 1 year ago

Hi, it is a good question. The whole process is a learning method. We can not control the inherent parameters. But the assumption about "the score of each small patch of the whole image is the same" is incorrect, this phenomenon will not appear.

Hi! Thank you for your reply! The reason why I say "the score of each small patch of the whole image is the same" is: For the training step, the whole image is as the input of the network, we need to crop the whole image into small patches, but we only have the score of the whole image, that is to say, the score of the small patche is inherited from the whole image, and the score of each small patch is the same, so how to learn different weights from the small patches with the same score? This seems contradictory?

TianheWu commented 1 year ago

Hi, I can understand what you mean. In fact, when a person evaluates an image, it is likely that he or she will not pay attention to all the information. crop can be understood as a means of data enhancement and also to adapt the size of the pre-training weights of VIT