Pytorch implementation of the paper "The Devil Is in the Details: Window-based Attention for Image Compression". CVPR2022.
This repository is based on CompressAI. We kept scripts for training and evaluation, and removed other components. The major changes are provided in compressai/models
. For the official code release, see the CompressAI.
This repo defines the CNN-based models and Transformer-based models for learned image compression in "The Devil Is in the Details: Window-based Attention for Image Compression".
The architecture of CNN-based model.
The architecture of Transformer-based model (STF).
Install CompressAI and the packages required for development.
conda create -n compress python=3.7
conda activate compress
pip install compressai
pip install pybind11
git clone https://github.com/Googolxx/STF stf
cd stf
pip install -e .
pip install -e '.[dev]'
Note: wheels are available for Linux and MacOS.
An examplary training script with a rate-distortion loss is provided in
train.py
.
Training a CNN-based model:
CUDA_VISIBLE_DEVICES=0,1 python train.py -d /path/to/image/dataset/ -e 1000 --batch-size 16 --save --save_path /path/to/save/ -m cnn --cuda --lambda 0.0035
e.g., CUDA_VISIBLE_DEVICES=0,1 python train.py -d openimages -e 1000 --batch-size 16 --save --save_path ckpt/cnn_0035.pth.tar -m cnn --cuda --lambda 0.0035
Training a Transformer-based model(STF):
CUDA_VISIBLE_DEVICES=0,1 python train.py -d /path/to/image/dataset/ -e 1000 --batch-size 16 --save --save_path /path/to/save/ -m stf --cuda --lambda 0.0035
To evaluate a trained model on your own dataset, the evaluation script is:
CUDA_VISIBLE_DEVICES=0 python -m compressai.utils.eval_model -d /path/to/image/folder/ -r /path/to/reconstruction/folder/ -a stf -p /path/to/checkpoint/ --cuda
CUDA_VISIBLE_DEVICES=0 python -m compressai.utils.eval_model -d /path/to/image/folder/ -r /path/to/reconstruction/folder/ -a cnn -p /path/to/checkpoint/ --cuda
The script for downloading OpenImages is provided in downloader_openimages.py
. Please install fiftyone first.
Visualization of the reconstructed image kodim01.png.
Visualization of the reconstructed image kodim07.png.
RD curves
RD curves on Kodak.
RD curves on CLIC Professional Validation dataset.
Method | Enc(s) | Dec(s) | PSNR | bpp |
---|---|---|---|---|
CNN | 0.12 | 0.12 | 35.91 | 0.650 |
STF | 0.15 | 0.15 | 35.82 | 0.651 |
Pretrained models (optimized for MSE) trained from scratch using randomly chose 300k images from the OpenImages dataset.
Method | Lambda | Link |
---|---|---|
CNN | 0.0018 | cnn_0018 |
CNN | 0.0035 | cnn_0035 |
CNN | 0.0067 | cnn_0067 |
CNN | 0.025 | cnn_025 |
STF | 0.0018 | stf_0018 |
STF | 0.0035 | stf_0035 |
STF | 0.0067 | stf_0067 |
STF | 0.013 | stf_013 |
STF | 0.025 | stf_025 |
STF | 0.0483 | stf_0483 |
Other pretrained models will be released successively.
@inproceedings{zou2022the,
title={The Devil Is in the Details: Window-based Attention for Image Compression},
author={Zou, Renjie and Song, Chunfeng and Zhang, Zhaoxiang},
booktitle={CVPR},
year={2022}
}