XiangZ-0 / HiT-SR

[ECCV 2024 - Oral] HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Apache License 2.0
74 stars 2 forks source link
eccv2024 efficient-transformer image-sr lightweight-image-super-resolution super-resolution transformer


Hierarchical Transformer
for Efficient Image Super-Resolution

Xiang Zhang1 · Yulun Zhang2 · Fisher Yu1

1ETH Zürich     2MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University

ECCV 2024 - Oral

[Paper] | [Supp] | [Video] | [🤗Hugging Face] | [Visual Results] | [Models]


Abstract: Transformers have exhibited promising performance in computer vision tasks including image super-resolution (SR). However, popular transformer-based SR methods often employ window self-attention with quadratic computational complexity to window sizes, resulting in fixed small windows with limited receptive fields. In this paper, we present a general strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR), boosting SR performance with multi-scale features while maintaining an efficient design. Specifically, we first replace the commonly used fixed small windows with expanding hierarchical windows to aggregate features at different scales and establish long-range dependencies. Considering the intensive computation required for large windows, we further design a spatial-channel correlation method with linear complexity to window sizes, efficiently gathering spatial and channel information from hierarchical windows. Extensive experiments verify the effectiveness and efficiency of our HiT-SR, and our improved versions of SwinIR-Light, SwinIR-NG, and SRFormer-Light yield state-of-the-art SR results with fewer parameters, FLOPs, and faster speeds (~7x).

📑 Contents


🔥 News

🛠️ Setup

git clone https://github.com/XiangZ-0/HiT-SR.git
conda create -n HiTSR python=3.8
conda activate HiTSR
pip install -r requirements.txt
python setup.py develop

💿 Datasets

Training and testing sets can be downloaded as follows:

Training Set Testing Set Visual Results
DIV2K (800 training images, 100 validation images) [organized training dataset DIV2K: One Drive] Set5 + Set14 + BSD100 + Urban100 + Manga109 [complete testing dataset: One Drive] One Drive

Download training and testing datasets and put them into the corresponding folders of datasets/. See datasets for the detail of the directory structure.

🚀 Models

Method #Param. (K) FLOPs (G) Dataset PSNR (dB) SSIM Model Zoo Visual Results
HiT-SIR 792 53.8 Urban100 (x4) 26.71 0.8045 One Drive One Drive
HiT-SNG 1032 57.7 Urban100 (x4) 26.75 0.8053 One Drive One Drive
HiT-SRF 866 58.0 Urban100 (x4) 26.80 0.8069 One Drive One Drive

The output size is set to 1280x720 to compute FLOPs.

🏋 Training

🧪 Testing

Test with ground-truth images

Test without ground-truth images

📊 Results

We apply our HiT-SR approach to improve SwinIR-Light, SwinIR-NG and SRFormer-Light, corresponding to our HiT-SIR, HiT-SNG, and HiT-SRF. Compared with the original structure, our improved models achieve better SR performance while reducing computational burdens.

More detailed results can be found in the paper. All visual results of can be downloaded here.

More results (click to expan) - Quantitative comparison

- [Local attribution map (LAM)](https://x-lowlevel-vision.github.io/lam.html) comparison (more marked pixels indicate better information aggragation ability)

- Qualitative comparison on challenging scenes

📎 Citation

If you find the code helpful in your research or work, please consider citing the following paper.

@inproceedings{zhang2024hitsr,
    title={HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution},
    author={Zhang, Xiang and Zhang, Yulun and Yu, Fisher},
    booktitle={ECCV},
    year={2024}
}

🏅 Acknowledgements

This project is built on DAT, SwinIR, NGramSwin, SRFormer, and BasicSR. Special thanks to their excellent works!