dvlab-research / MASA-SR

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)
161 stars 20 forks source link

Quite bad results on the linked test data ? #1

Closed mylifeasazucchini closed 3 years ago

mylifeasazucchini commented 3 years ago

Hi!

First of all I have to say that I am quite impressed by the possibility your current work has shown. However, after downloading the test data that you linked on your repo (CUFED5) and running the shell script for testing it using both the pre-trained models made available , I have been quite disappointed by the accompanying results :

image

(the reference images have been scaled to just fit into the presentation but I think they are of 500x332 resolution)

I have not looked too much into the test script that you have provided but I did look at the architecture you proposed in the paper and here are some observations/comments/questions I have:

1. shouldn't the Reference Image Resolution be 4 times the LR ? (this does not seem to be the case with the data provided)

2. currently upon running the shell script for test, the Resolution seems to be the same as the input, so is the shared implementation more like detail enhancement rather than SR ?

3. The input and Reference look to be of the same quality, so technically I would not be expecting massive improvements in the SR result, however if you could do something like DSLR based reference on a phones camera, it would be quite novel and interesting experiment. (This is the goal I hope to achieve using your methodology)

4. More theoretically, what kind of results would you expect when the input and reference are completely mismatched ?

Extremely sorry for the many naive questions that I might have asked but I hope to hear your views on the same!

SkyeLu commented 3 years ago

Hi, thanks for your interest in our work.

  1. There is no limit of the reference image resolution , it can be any size (better if larger than the input).
  2. To evaluate on the standard testing set, we do not feed the input image into the network directly. Instead, we first down-scale the input image by scale 4 to obtain the LR, and the input image is taken as ground truth to evaluate the PSNR. See: https://github.com/dvlab-research/MASA-SR/blob/217a1e5031ece5c74fc598ade1e4f4edb809d719/dataloader/dataset.py#L364
  3. Same as question 2, the input image is actually much smaller than the reference image. Therefore, some of our testing results might not be quite satisfying as you mentioned, especially on the hunman faces (need improvements in future work).
  4. The quality of the output image might be similar with that of SISR methods.