XPixelGroup / HAT

CVPR2023 - Activating More Pixels in Image Super-Resolution Transformer Arxiv - HAT: Hybrid Attention Transformer for Image Restoration
Apache License 2.0
1.18k stars 140 forks source link

Blocky output #34

Open shreykshah opened 1 year ago

shreykshah commented 1 year ago

The model seems to be outputting images with many large pixel-like squares. These are clearly visible in the enlarged image. The original image is 700x500 image.

squares

chxy95 commented 1 year ago

@shreykshah This looks like a specific out-of-distribution case, since the released models are trained based on natural images. If possible, you could send me the input image and I will check it.

shreykshah commented 1 year ago

@chxy95 This is a zoomed in version of the image to highlight the issue. This is one example. This has happened on many images, including multiple natural images. The blocks happen across the entire image, both in foreground and background.

chxy95 commented 1 year ago

@shreykshah Which model is used for generating the results? Could you send me some input sample that would produce these phenomena by email or any other ways? Window-based SA model indeed would generate blocking artifacts for some image restoration tasks, but I haven't observed such severe cases in image SR.

shreykshah commented 1 year ago

@chxy95 I tried with two different images, one black and white, the other full color, with people in it. I don't feel comfortable sharing the photo, but I tried it on HAT_SRx2, HAT_SRx3, HAT_SRx4, HAT-L_SRx2_ImageNet-pretrain, HAT-L_SRx3_ImageNet-pretrain, and HAT-L_SRx4_ImageNet-pretrain, all of which produce blocky results (albeit to varying degrees) on both images.

shahargadshriki commented 1 year ago

I have the same issue

image (33)
chxy95 commented 1 year ago

This phenomenon does seem to be a flaw in our approach due to the fixed window size for self-attention calculation. This problem seems difficult to solve under the existing framework. I think it might work to lower the resolution of the input image properly first.

shahargadshriki commented 1 year ago

Can you explain please why tile mode doesn't solve it (every tile is low-resolution input)?

chxy95 commented 1 year ago

I think the blocky phenomenon is caused by the low information density of the input image. In other words, there is not enough valid information in the fixed window size used for self-attention computation for SR. Tile mode changes the resolution but not the information density. Appropriate downsampling may be able to alleviate this phenomenon by changing the information density.

This is also my guess. What I can confirm is that this is indeed caused by the window-based self-attention mechanism in HAT. It seems that the blocky problem is difficult to deal with, although we have tried to alleviate it in our network design.

CuddleSabe commented 1 month ago

I think the blocky phenomenon is caused by the low information density of the input image. In other words, there is not enough valid information in the fixed window size used for self-attention computation for SR. Tile mode changes the resolution but not the information density. Appropriate downsampling may be able to alleviate this phenomenon by changing the information density.

This is also my guess. What I can confirm is that this is indeed caused by the window-based self-attention mechanism in HAT. It seems that the blocky problem is difficult to deal with, although we have tried to alleviate it in our network design.

well, it seems like use unreal-world degrade model to process jpeg image