duanzhiihao / lossy-vae

Authors' PyTorch implementation of lossy image compression methods based on hierarchical VAEs
54 stars 6 forks source link
lossy-compression lossy-image-compression pytorch vae variational-autoencoder

Lossy Image Compression using Hierarchical VAEs

This repository contains the authors' implementation of several deep learning-based methods related to lossy image compression.

Models

Implemented Methods (Pre-Trained Models Available)

Features

Progressive coding: our models learn a deep hierarchy of latent variables and compress/decompress images in a coarse-to-fine fashion. This feature comes from the hierarchical nature of ResNet VAEs.

Compression performance: our models are powerful in terms of both rate-distortion and decoding speed. Please see the results section below.

Results

Bpp-PSNR results in JSON format

Notes on metric computation:

Encoding/decoding latency on CPU/GPU, and BD-rate

| Model Name | CPU* Enc. | CPU* Dec. | 3080 ti Enc. | 3080 ti Dec. | BD-rate* (lower is better) | | :---------: | :-------: | :-------: | :----------: | :----------: | :------------------------: | | `qres34m` | 0.899s | 0.441s | 0.116s | 0.083s | -3.95 % | | `qarv_base` | 0.757s | 0.295s | 0.096s | 0.063s | -7.26 % |

Time is the latency to encode/decode a 512x768 image, averaged over 24 Kodak images. Tested in plain PyTorch (v1.13 + CUDA 11.7) code, ie, no mixed-precision, torchscript, ONNX/TensorRT, etc. \ CPU is Intel 10700k. \ *BD-rate is w.r.t. VTM 18.0, averaged on three common test sets (Kodak, Tecnick TESTIMAGES, and CLIC 2022 test set).

Install

Requirements:

Download and Install:

  1. Download the repository;
  2. Modify the dataset paths in lossy-vae/lvae/paths.py.
  3. [Optional] pip install the repository in development mode:
    cd /pasth/to/lossy-vae
    python -m pip install -e .

Usage

Get pre-trained weights

from lvae import get_model
model = get_model('qarv_base', pretrained=True) # weights are downloaded automatically
model.eval()
model.compress_mode(True) # initialize entropy coding

Compress images

Encode an image:

model.compress_file('/path/to/image.png', '/path/to/compressed.bits')

Decode an image:

im = model.decompress_file('/path/to/compressed.bits')
# im is a torch.Tensor of shape (1, 3, H, W). RGB. pixel values in [0, 1].

Datasets

COCO

  1. Download the COCO dataset "2017 Train images [118K/18GB]" from https://cocodataset.org/#download
  2. Unzip the images anywhere, e.g., at /path/to/datasets/coco/train2017
  3. Edit lossy-vae/lvae/paths.py such that
    known_datasets['coco-train2017'] = '/path/to/datasets/coco/train2017'

Kodak (link), Tecnick TESTIMAGES (link), and CLIC (link)

python scripts/download-dataset.py --name kodak         --datasets_root /path/to/datasets
                                          clic2022-test
                                          tecnick

Then, edit lossy-vae/lvae/paths.py such that known_datasets['kodak'] = '/path/to/datasets/kodak', and similarly for other datasets.

Custom Dataset

  1. Prepare a folder containing images. The folder should contain only images (may contain subfolders).
  2. Edit lossy-vae/lvae/paths.py such that known_datasets['custom-name'] = '/path/to/my-dataset', where custom-name is the name of your dataset, and /path/to/my-dataset is the path to the folder containing images.
  3. Then, you can use custom-name as the dataset name in the training/evaluation scripts.

Training and evaluation scripts

Training and evaluation scripts vary from model to model. For example, qres34m uses fixed-rate train/eval scheme, while qarv_base uses variable-rate train/eval scheme. \ Detailed training/evaluation instructions are provided in each model's subfolder (see the section Models).

License

Code in this repository is freely available for non-commercial use.