Justin-Tan / high-fidelity-generative-compression

Pytorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression
Apache License 2.0
411 stars 77 forks source link
computer-vision entropy-coding generative-adversarial-networks image-compression machine-learning pytorch

high-fidelity-generative-compression

Pytorch implementation of the paper "High-Fidelity Generative Image Compression" by Mentzer et. al.. This repo also provides general utilities for lossless compression that interface with Pytorch. For the official (TensorFlow) code release, see the TensorFlow compression repo.

About

This repository defines a model for learnable image compression based on the paper "High-Fidelity Generative Image Compression" (HIFIC) by Mentzer et. al.. The model is capable of compressing images of arbitrary spatial dimension and resolution up to two orders of magnitude in size, while maintaining perceptually similar reconstructions. Outputs tend to be more visually pleasing than standard image codecs operating at higher bitrates.

This repository also includes a partial port of the Tensorflow Compression library - which provides general tools for neural image compression in Pytorch.

Open In Colab

You can play with a demonstration of the model in Colab, where you can upload and compress your own images.

Example

Original HiFIC
guess guess
Original: (6.01 bpp - 2100 kB) | HiFIC: (0.160 bpp - 56 kB). Ratio: 37.5.

The image shown is an out-of-sample instance from the CLIC-2020 dataset. The HiFIC image is obtained by reconstruction via a learned model provided below.

Note that the learned model was not adapted in any way for evaluation on this image. More sample outputs from this model can be found at the end of the README and in EXAMPLES.md.

Note

The generator is trained to achieve realistic and not exact reconstruction. It may synthesize certain portions of a given image to remove artifacts associated with lossy compression. Therefore, in theory images which are compressed and decoded may be arbitrarily different from the input. This precludes usage for sensitive applications. An important caveat from the authors is reproduced here:

"Therefore, we emphasize that our method is not suitable for sensitive image contents, such as, e.g., storing medical images, or important documents."

Usage

pip install -r requirements.txt
git clone https://github.com/Justin-Tan/high-fidelity-generative-compression.git
cd high-fidelity-generative-compression

To check if your setup is working, run python3 -m src.model in root. Usage instructions can be found in the user's guide.

Training

# Train initial autoencoding model
python3 train.py --model_type compression --regime low --n_steps 1e6
# Train using full generator-discriminator loss
python3 train.py --model_type compression_gan --regime low --n_steps 1e6 --warmstart --ckpt path/to/base/checkpoint

Compression

python3 compress.py -i path/to/image/dir -ckpt path/to/trained/model --reconstruct

The compressed format can be transmitted and decoded using the routines in compress.py. The Colab demo illustrates the decoding process.

Pretrained Models

Target bitrate (bpp) Weights Training Instructions
0.14 HIFIC-low
python3 train.py --model_type compression_gan --regime low --warmstart -ckpt path/to/trained/model -nrb 9 -norm
0.30 HIFIC-med
python3 train.py --model_type compression_gan --regime med --warmstart -ckpt path/to/trained/model --likelihood_type logistic
0.45 HIFIC-high
python3 train.py --model_type compression_gan --regime high --warmstart -ckpt path/to/trained/model -nrb 9 -norm

Examples

The samples below are taken from the CLIC2020 dataset, external to the training set. The bitrate is reported in bits-per-pixel (bpp). The reconstructions are produced using the above HIFIC-med model (target bitrate 0.3 bpp). It's interesting to try to guess which image is the original (images are saved as PNG for viewing - best viewed widescreen). You can expand the spoiler tags below each image to reveal the answer.

For more examples see EXAMPLES.md. For even more examples see this shared folder (images within generated using the HIFIC-low model).

A B
guess guess
Image 1 ```python Original: A (11.8 bpp) | HIFIC: B (0.269 bpp). Ratio: 43.8 ```
A B
guess guess
Image 2 ```python Original: A (14.6 bpp) | HIFIC: B (0.330 bpp). Ratio: 44.2 ```
A B
guess guess
Image 3 ```python Original: A (12.3 bpp) | HIFIC: B (0.209 bpp). Ratio: 58.9 ```
A B
guess guess
Image 4 ```python Original: B (19.9 bpp) | HIFIC: A (0.565 bpp). Ratio: 35.2 ```

The last two show interesting failure modes: small figures in the distance are almost entirely removed (top of the central rock in the penultimate image), and the required model bitrate increases significantly when the image is dominated by high-frequency components.

Authors

Acknowledgements

Contributing

All content in this repository is licensed under the Apache-2.0 license. Please open an issue if you encounter unexpected behaviour, or have corrections/suggestions to contribute.

Citation

This is a PyTorch port of the original implementation. Please cite the original paper if you use their work.

@article{mentzer2020high,
  title={High-Fidelity Generative Image Compression},
  author={Mentzer, Fabian and Toderici, George and Tschannen, Michael and Agustsson, Eirikur},
  journal={arXiv preprint arXiv:2006.09965},
  year={2020}
}