cswry / SeeSR

[CVPR2024] SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution
Apache License 2.0
337 stars 14 forks source link

Not work for people #4

Closed CuddleSabe closed 6 months ago

CuddleSabe commented 6 months ago

00004

CuddleSabe commented 6 months ago

Too scary...

截屏2023-12-26 17 19 44
cswry commented 6 months ago

Hello, can you provide the corresponding LR image and the corresponding inference command?

CuddleSabe commented 6 months ago

00004 0012 the command params are default

CuddleSabe commented 6 months ago

they all realsr, and have mpeg degradation

cswry commented 6 months ago

person2 person1

Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.

We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the --upscale to 4 during the inference.

There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

CuddleSabe commented 6 months ago

person2 person1

Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.

We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the --upscale to 4 during the inference.

There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

  • During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.
  • The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

thank for your reply! well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input? LOL

cswry commented 6 months ago

person2 person1 Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours. We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the --upscale to 4 during the inference. There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.

  • During our training, we only simulated degradation such as noise, blur, and jpeg compression following RealESRGAN pipeline. For unknown degradations like mpeg degradation, the model's generalization ability is limited.
  • The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called SeeSR-face specifically for facial scenarios. Please stay tuned for updates on this.

thank for your reply! well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input? LOL

SeeSR operates a diffusion process within the latent space.

When you input an x1 LR image, its inherent low resolution diminish more after undergoing VAE Encoder compression (spatial resolution reduced by 8 times). In such a state, it is difficult to sustain the spatial structure within the limited latent space.

It induces the model towards uncontrollable generation, which also explains why the second facial image appear somewhat strange.