InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.15k stars 228 forks source link

Image size=(672, 512) cannot be processed by most model #252

Open mynotwo opened 1 year ago

mynotwo commented 1 year ago

Hi, I'm using your colab tutorial to compress some random size images. And I found that when image size=(672, 512), they cannot be processed. Could you fix this bug?

YodaEmbedding commented 1 year ago

There are different ways to deal with this:

  1. Pad the image until it is a multiple of 64.
  2. A memory-efficient "smart padding" method proposed by Huawei (IIRC).

See below for example code.

mynotwo commented 1 year ago

There are different ways to deal with this. One way is to pad the image until it is a multiple of 64.

But I want to compress the image. Wouldn't padding affect the compression ratio?

YodaEmbedding commented 1 year ago

I believe pre-padding before a transform is a necessary evil that even some non-learned codecs use, though I'm guessing they've become more efficient at it throughout the years. Perhaps adaptive entropy coding mitigates the cost of the additional data. Also, DFT/DCT/DWT of a zero-padded signal may not be so bad due to some properties of the transforms.

Worst-case test

Testing by padding an extra 64x64:

import torch
import torch.nn.functional as F
from compressai.zoo import bmshj2018_hyperprior
from PIL import Image
from torchvision import transforms

device = "cuda"
net = bmshj2018_hyperprior(quality=2, pretrained=True).eval().to(device)
img = Image.open("/data/datasets/kodak/test/kodim01.png").convert("RGB")
x = transforms.ToTensor()(img).unsqueeze(0).to(device)

def inference(net, x, pad):
    unpad = tuple(-p for p in pad)
    with torch.no_grad():
        out_enc = net.compress(F.pad(x, pad))
    [y_strings, z_strings] = out_enc["strings"]
    size = sum(len(s) for s in y_strings) + sum(len(s) for s in z_strings)
    out_dec = net.decompress(**out_enc)
    x_hat = F.pad(out_dec["x_hat"], unpad).clamp_(0, 1)
    psnr = -10 * ((x_hat - x)**2).mean().log10()
    return size, psnr

size, psnr = inference(net, x, pad=(0, 0, 0, 0))
print(f"No pad:    {size} bytes  {psnr:.2f} dB")

size, psnr = inference(net, x, pad=(0, 64, 0, 64))
print(f"Extra pad: {size} bytes  {psnr:.2f} dB")

bmshj2018-hyperprior:

No pad:    16500 bytes  26.88 dB
Extra pad: 16920 bytes  26.89 dB  (2.5% increase in bytes)

mbt2018:

No pad:    14680 bytes  27.14 dB
Extra pad: 14824 bytes  27.16 dB  (1% increase in bytes)

Yes, it looks like there is a small difference felt, particularly for entropy models that are "weaker" at spatial redundancy.


Smart padding

Another method I've seen proposed by Huawei (IIRC) is to pad to a multiple of 2 (i.e. even dimensions) just before each 2x downsampling operation. But you'll need to intertwine these operations into the model layers. Perhaps define a SmartPadding layer:

import torch.nn as nn
import torch.nn.functional as F
from compressai.ops import compute_padding

class SmartPadding(nn.Module):
    def __init__(self, min_div=2, method="pad", link=None, **padding_kwargs):
        super().__init__()
        self.min_div = min_div
        self.padding_kwargs = padding_kwargs
        self.method = method
        self._unpad = None
        self._linked_modules = []
        if link is not None:
            self._linked_modules.append(link)

    def forward(self, x):
        *_, h, w = x.shape
        if self.method == "pad":
            padding, self._unpad = compute_padding(h, w, min_div=self.min_div)
        elif self.method == "unpad":
            padding = self._linked_modules[0]._unpad
        print(self.method, h, w, padding)
        return F.pad(x, padding, **self.padding_kwargs)

And then define a method for injecting these layers into an existing model:

def inject_model(self, **kwargs):
    # self.g_a = nn.Sequential(
    #     SmartPadding(**kwargs),
    #     conv(3, N),
    #     GDN(N),
    #     SmartPadding(**kwargs),
    #     conv(N, N),
    #     GDN(N),
    #     SmartPadding(**kwargs),
    #     conv(N, N),
    #     GDN(N),
    #     SmartPadding(**kwargs),
    #     conv(N, M),
    # )

    self.g_a.insert(0, SmartPadding(**kwargs))
    self.g_a.insert(3, SmartPadding(**kwargs))
    self.g_a.insert(6, SmartPadding(**kwargs))
    self.g_a.insert(9, SmartPadding(**kwargs))

    # self.g_s = nn.Sequential(
    #     deconv(M, N),
    #     SmartPadding(method="unpad", link=self.g_a[9], **kwargs),
    #     GDN(N, inverse=True),
    #     deconv(N, N),
    #     SmartPadding(method="unpad", link=self.g_a[6], **kwargs),
    #     GDN(N, inverse=True),
    #     deconv(N, N),
    #     SmartPadding(method="unpad", link=self.g_a[3], **kwargs),
    #     GDN(N, inverse=True),
    #     deconv(N, 3),
    #     SmartPadding(method="unpad", link=self.g_a[0], **kwargs),
    # )

    self.g_s.insert(1, SmartPadding(method="unpad", link=self.g_a[9], **kwargs))
    self.g_s.insert(4, SmartPadding(method="unpad", link=self.g_a[6], **kwargs))
    self.g_s.insert(7, SmartPadding(method="unpad", link=self.g_a[3], **kwargs))
    self.g_s.insert(10, SmartPadding(method="unpad", link=self.g_a[0], **kwargs))

    # self.h_a = nn.Sequential(
    #     conv(M, N, stride=1, kernel_size=3),
    #     nn.ReLU(inplace=True),
    #     SmartPadding(**kwargs),
    #     conv(N, N),
    #     nn.ReLU(inplace=True),
    #     SmartPadding(**kwargs),
    #     conv(N, N),
    # )

    self.h_a.insert(2, SmartPadding(**kwargs))
    self.h_a.insert(5, SmartPadding(**kwargs))

    # self.h_s = nn.Sequential(
    #     deconv(N, N),
    #     SmartPadding(method="unpad", link=self.h_a[5], **kwargs),
    #     nn.ReLU(inplace=True),
    #     deconv(N, N),
    #     SmartPadding(method="unpad", link=self.h_a[2], **kwargs),
    #     nn.ReLU(inplace=True),
    #     conv(N, M, stride=1, kernel_size=3),
    #     nn.ReLU(inplace=True),
    # )

    self.h_s.insert(1, SmartPadding(method="unpad", link=self.h_a[5], **kwargs))
    self.h_s.insert(4, SmartPadding(method="unpad", link=self.h_a[2], **kwargs))

...And finally, let's inject and compare the various methods:

import torch
import torch.nn.functional as F
from compressai.zoo import bmshj2018_hyperprior
from PIL import Image
from torchvision import transforms

device = "cuda"
net = bmshj2018_hyperprior(quality=2, pretrained=True).eval().to(device)
img = Image.open("/data/datasets/kodak/test/kodim01.png").convert("RGB")
x = transforms.ToTensor()(img).unsqueeze(0).to(device)

def inference(net, x, pad):
    unpad = tuple(-p for p in pad)
    with torch.no_grad():
        out_enc = net.compress(F.pad(x, pad))
    [y_strings, z_strings] = out_enc["strings"]
    size = sum(len(s) for s in y_strings) + sum(len(s) for s in z_strings)
    out_dec = net.decompress(**out_enc)
    x_hat = F.pad(out_dec["x_hat"], unpad).clamp_(0, 1)
    psnr = -10 * ((x_hat - x)**2).mean().log10()
    return size, psnr

size, psnr = inference(net, x, pad=(0, 0, 0, 0))
print(f"No pad:    {size} bytes  {psnr:.2f} dB")

size, psnr = inference(net, x, pad=(0, 64, 0, 64))
print(f"Extra pad: {size} bytes  {psnr:.2f} dB")

inject_model(net, mode="constant", value=0)
size, psnr = inference(net, x, pad=(0, 1, 0, 1))
print(f"Smart pad: {size} bytes  {psnr:.2f} dB")

Output:

No pad:    16500 bytes  26.88 dB
Extra pad: 16920 bytes  26.89 dB
Smart pad: 17100 bytes  26.78 dB

Smart padding in action:
method height width padding
pad 513 769 (0, 1, 0, 1)
pad 257 385 (0, 1, 0, 1)
pad 129 193 (0, 1, 0, 1)
pad 65 97 (0, 1, 0, 1)
pad 33 49 (0, 1, 0, 1)
pad 17 25 (0, 1, 0, 1)
unpad 18 26 (0, -1, 0, -1)
unpad 34 50 (0, -1, 0, -1)
unpad 66 98 (0, -1, 0, -1)
unpad 130 194 (0, -1, 0, -1)
unpad 258 386 (0, -1, 0, -1)
unpad 514 770 (0, -1, 0, -1)

...Hmm... it's not as good as I had hoped, but this was without any special training. We're padding tensors with zeroes, and hoping it doesn't alter their behavior too much, after all. Perhaps better models that are padding-adaptive could be trained.

WenBingo commented 6 months ago

Hi, I'm using your colab tutorial to compress some random size images. And I found that when image size=(672, 512), they cannot be processed. Could you fix this bug?

Hello, I would like to ask you a question about the installation of compressai “Successfully installed compressai-1.2.4.dev0” . I successfully installed cmpressai, but I got the following error when importing:

import compressai Traceback (most recent call last): File "", line 1, in File "F:\compressai\compressai__init__.py", line 30, in from compressai import ( File "F:\compressai\compressai\latent_codecs__init__.py", line 38, in from .rasterscan import RasterScanLatentCodec File "F:\compressai\compressai\latent_codecs\rasterscan.py", line 38, in from compressai.ans import BufferedRansEncoder, RansDecoder ImportError: DLL load failed while importing ans: 找不到指定的程序。