Closed CuddleSabe closed 6 months ago
Too scary...
Hello, can you provide the corresponding LR image and the corresponding inference command?
the command params are default
they all realsr, and have mpeg degradation
Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.
We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images.
Please make sure to set the --upscale
to 4 during the inference.
There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.
RealESRGAN pipeline
. For unknown degradations like mpeg degradation, the model's generalization ability is limited. SeeSR-face
specifically for facial scenarios. Please stay tuned for updates on this.
![]()
Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours.
We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the
--upscale
to 4 during the inference.There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.
- During our training, we only simulated degradation such as noise, blur, and jpeg compression following
RealESRGAN pipeline
. For unknown degradations like mpeg degradation, the model's generalization ability is limited.- The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called
SeeSR-face
specifically for facial scenarios. Please stay tuned for updates on this.
thank for your reply! well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input? LOL
![]()
Hello, the two images above are the test results obtained on our end using the default command settings. There are significant differences between our test results and yours. We observe that the resolution of your provided super-resolution results is equal to the resolution of your input images. Please make sure to set the
--upscale
to 4 during the inference. There is room for improvement in the facial results of the second image, which could be attributed to two possible reasons.
- During our training, we only simulated degradation such as noise, blur, and jpeg compression following
RealESRGAN pipeline
. For unknown degradations like mpeg degradation, the model's generalization ability is limited.- The currently open-source model is trained for general scenarios and may have limited performance in specialized scenarios such as face. In the future, we are planning to train a specialized version called
SeeSR-face
specifically for facial scenarios. Please stay tuned for updates on this.thank for your reply! well, thats the confusion. Due to my experience, the x1 model use the Real-ESRGAN degradation has the capacity to process the 4x up sample input(because of the resize process in degradation), but why the x4 model can't process the x1 sample input? LOL
SeeSR operates a diffusion process within the latent space.
When you input an x1 LR image, its inherent low resolution diminish more after undergoing VAE Encoder compression (spatial resolution reduced by 8 times). In such a state, it is difficult to sustain the spatial structure within the limited latent space.
It induces the model towards uncontrollable generation, which also explains why the second facial image appear somewhat strange.