Some statements in the paper - Githubissues

SHI-Labs / Cross-Scale-Non-Local-Attention

PyTorch code for our paper "Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining" (CVPR2020).

399 stars 46 forks source link

Some statements in the paper #35

Open qiqiing opened 3 years ago

qiqiing commented 3 years ago

In your'Cross-Scale Non-Local (CS-NL) Attention' section, you mentioned'find pixel-wise similarity between X and Y', but in my opinion, the similarity calculations of your cross-scale modules are all based on "patch-wise". Also, you mentioned, "the Cross-Scale Non-Local (CS-NL) attention directly utilizes the patches matched to each pixel within this LR image" why is "matched to each pixel" here, can you explain this part ? Thank you.

HarukiYqM commented 3 years ago

Hi, The initial definition of the CSNLA is indeed defined as “pixel-wise”. This means that the correlation is estimated between pixels in X and pixels in Y. However, a pixel in Y represents a SxS patch in X. That is why we say “ the Cross-Scale Non-Local (CS-NL) attention directly utilizes the patches matched to each pixel within this LR image”.

As mentioned in the later section “Patch-based Cross-Scale Non-local Attention”, the actual implementation is at patch level, where the correlation is computed between patches in X and patches in Y. In this case, a PxP patch in Y represents a SPxSP patch in X.

qiqiing commented 3 years ago

ok,thank you. I can get it. There is one more question. Regarding the problem of the "Feature-wise affinity measurement" you raised: "it is likely that many erroneous matches will be synthesized to HR tensors", how does your 'Patch-Based Cross-Scale Non-Local Attention' module solve or improve it?I can't get this from your paper.

HarukiYqM commented 3 years ago

Hi, this is because patch-based matching adds an additional similarity constraint on neighboring features, so it is more robust. This design is similar to classic non-local means filtering, where block-wise matching is better than pixel-wise matching.

qiqiing commented 3 years ago

thank you very much. In addition, I used your cross-scale aggregation module, and found that as the number of iterations increases during the training process, the training set loss can normally drop, but the loss of the validation set fluctuates significantly (it will rise and fall), may I ask What is the possible reason for this? What needs to be improved?

HarukiYqM commented 3 years ago

Hi, a slight fluctuation on validation set is normal for SR task. I guess as long as the loss drops, it should be fine.

qiqiing commented 3 years ago

Is it difficult to converge the model with cross-scale modules? Until the end of training, my validation set loss has been oscillating. It is difficult for me to judge whether it has converged. Is there any way to speed up the model convergence?

HarukiYqM commented 3 years ago

If you are using DIV2K and following the training strategy of this code, the L1 loss should converged and it should be fine. For the CSNLN, I did not observe large fluctuations on validation set.

qiqiing commented 3 years ago

Yes, I used your cross-scale attention module on my own data set, not DIV2K. In the middle and late stages of training, I observed that not only the validation set, but also the loss on the training set fluctuates (once it will rise and then it will fall), but the overall trend is down. Is this phenomenon normal? Is there a way to improve it? I have tried using different learning rates to train, but the effect is not obvious.