This repository contains the official implementation of the following paper:
SRFormer: Permuted Self-Attention for Single Image Super-Resolution
Yupeng Zhou 1, Zhen Li 1, Chun-Le Guo 1, Song Bai 2, Ming-Ming Cheng 1, Qibin Hou 1
1TMCC, School of Computer Science, Nankai University
2ByteDance, Singapore
In ICCV 2023
[Paper] [Code] [Pretrained Model] [Visual Results] [Demo]
We have uploaded a realworld super-resolution model, which is trained based on real-esrgan. We may update the weights of the model in the future.
SRFormer is a new image SR backbone with SOTA performance. The core of SRFormer is PSA, a simple, efficient and effective attention mechanism, allowing to build large range pairwise correlations with even less computational burden than original WSA of SwinIR. SRFormer (ICCV open access link) achieves state-of-the-art performance in
The table below are performance comparison with SwinIR under same training strategy on DIV2K dataset (X2 SR), SRFormer greatly outperform SwinIR with less Paramaters(10.40M vs 11.75M) and Flops(2741G vs 2868G), More results can be found here.
model | Set5 | Set14 | B100 | Urban100 | Manga109 |
---|---|---|---|---|---|
SwinIR | 38.35 | 34.14 | 32.44 | 33.40 | 39.60 |
SRFormer(ours) | 38.45 | 34.21 | 32.51 | 33.86 | 39.69 |
Abstract: In this paper, we introduce SRFormer, a simple yet effective Transformer-based model for single image super-resolution. We rethink the design of the popular shifted window self-attention, expose and analyze several characteristic issues of it, and present permuted self-attention (PSA). PSA strikes an appropriate balance between the channel and spatial information for self-attention, allowing each Transformer block to build pairwise correlations within large windows with even less computational burden. Our permuted self-attention is simple and can be easily applied to existing super-resolution networks based on Transformers. Without any bells and whistles, we show that our SRFormer achieves a 33.86dB PSNR score on the Urban100 dataset, which is 0.46dB higher than that of SwinIR but uses fewer parameters and computations. We hope our simple and effective approach can serve as a useful tool for future research in super-resolution model design. Our code is publicly available at https://github.com/HVision-NKU/SRFormer.
![]()
You can apply PSA with just a few lines of code, significantly reducing computational complexity. We omit head_number, relative position encoding for simplicity, you can visit here to view more detailed code.
## Original MSA in SwinIR:
## qkv = self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
# PSA compress the channel dimension of KV :(num_windows*b, n//4, c):
kv = self.kv(x).reshape(b_,self.permuted_window_size[0],2,self.permuted_window_size[1],2,2,c//4).permute(0,1,3,5,2,4,6).reshape(b_, n//4, 2,-1).permute(2, 0, 1, 3)
# PSA keep the channel dimension of Q: (num_windows*b, n, c)
q = self.q(x).reshape(b_, n,-1).permute(2, 0, 1)
attn = (q @ k.transpose(-2, -1)) # (num_windows*b, num_heads, n, n//4)
x = (attn @ v).transpose(1, 2).reshape(b_, n, c) # (num_windows*b, n, c)
x = self.proj(x)
cd SRFormer
pip install -r requirements.txt
python setup.py develop
We use the same training and testing sets as SwinIR, the following datasets need to be downloaded for training.
Task | Training Set | Testing Set |
---|---|---|
classical image SR | DIV2K (800 training images) or DIV2K +Flickr2K (2650 images) | Set5 + Set14 + BSD100 + Urban100 + Manga109 Download all |
lightweight image SR | DIV2K (800 training images) | Set5 + Set14 + BSD100 + Urban100 + Manga109 Download all |
real-world image SR | DIV2K (800 training images) +Flickr2K (2650 images) + OST (10324 images for sky,water,grass,mountain,building,plant,animal) | RealSRSet+5images |
rename
command of linux can easily do it./options/train/SRFormer
Please note: "4" in the following instructions means four GPUs. Please modify it according to your configuration. You are also encouraged to modify the YAML file in "options/train/SRFormer/" to set more training settings.
# train SRFormer for classical SR task
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_SRx2_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_SRx3_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_SRx4_scratch.yml
# train SRFormer for lightweight SR task
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_light_SRx2_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_light_SRx3_scratch.yml
./scripts/dist_train.sh 4 options/train/SRFormer/train_SRFormer_light_SRx4_scratch.yml
# test SRFormer for classical SR task
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx2.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx3.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx4.yml
# test SRFormer for lightweight SR task
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx2.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx3.yml
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx4.yml
# test SRFormer for realworld SR task
python basicsr/test.py -opt options/test/SRFormer/test_SRFormer-S_x4_real.yml
We provide a script which you can use our pretrained models to upscale your own pictures. We will also release our realworld pretrained models soon.
# use SRFormer for classical SR task
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx2.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx3.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_DF2Ksrx4.yml --input_dir {dir of your pictures} --output_dir {dir of output}
# use SRFormer for lightweight SR task
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx2.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx3.yml --input_dir {dir of your pictures} --output_dir {dir of output}
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer_light_DIV2Ksrx4.yml --input_dir {dir of your pictures} --output_dir {dir of output}
# use SRFormer for realworld SR task
python basicsr/infer_sr.py -opt options/test/SRFormer/test_SRFormer-S_x4_real.yml --input_dir {dir of your pictures} --output_dir {dir of output}
We provide the results on classical image SR, lightweight image SR, realworld image SR. More results can be found in the [paper](). The visual results of SRFormer can be found in [Visual Results].
Classical image SR
Lightweight image SR
Model size comparison
Realworld image SR
Official pretrain models can be download from google drive.
To reproduce the results in the article, you can download them and put them in the /PretrainModel
folder.
Also, we thank @Phhofm for training a third-party pretrain model, you can visit here to learn more.
You may want to cite:
@article{zhou2023srformer,
title={SRFormer: Permuted Self-Attention for Single Image Super-Resolution},
author={Zhou, Yupeng and Li, Zhen and Guo, Chun-Le and Bai, Song and Cheng, Ming-Ming and Hou, Qibin},
journal={arXiv preprint arXiv:2303.09735},
year={2023}
}
This code is licensed under the Creative Commons Attribution-NonCommercial 4.0 International for non-commercial use only. Please note that any commercial use of this code requires formal permission prior to use.
The codes are based on BasicSR, Swin Transformer, and SwinIR. Please also follow their licenses. Thanks for their awesome works.