InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research
https://interdigitalinc.github.io/CompressAI/
BSD 3-Clause Clear License
1.19k stars 232 forks source link

Difference between forward vs compress/decompress reconstruction #20

Closed navid-mahmoudian closed 3 years ago

navid-mahmoudian commented 3 years ago

Hello,

I have a question for a better understanding of your very useful and nice library, and it would be great if you could add a similar example to your example folder for others.

I did a simple test and noticed there is a difference between actual reconstruction results (obtained by compress/decompress functions) and the one obtained by the forward function. The difference is in both the reconstructed results and the estimated bits. However, if I clamp the output of the forward function then there is no difference in reconstruction results, but still, there is a difference between theoretical bit rates and actual bitrates. So, I have two questions in that regards: 1- Does it mean that the compress and decompress function somehow clamp the results? i.e., there is no need to clamp the output by ourselves? 2- Does the difference between theoretical and actual bitrates come from the practical implementation of the encoder that imposes some extra bits for tasks such as the "end of file" symbol, discretization of everything into bits, etc.)

Here is a simple code to test:

import math
import torch
from torchvision import transforms
from PIL import Image

def compute_theoretical_bits(out_net):
    list_latent_bits = [torch.ceil((torch.log(likelihoods).sum(dim=(1, 2, 3)) / (-math.log(2)))) for likelihoods in out_net['likelihoods'].values()]
    total_bits_per_image = torch.sum(torch.stack(list_latent_bits, dim=0), dim=0).long()
    return total_bits_per_image

def compute_actual_bits(compressed_stream):
    list_latent_bits = [torch.tensor([len(s) * 8 for s in list_s]) for list_s in compressed_stream["strings"]]
    total_bits_per_image = torch.sum(torch.stack(list_latent_bits, dim=0), dim=0)
    return total_bits_per_image

from compressai.zoo import bmshj2018_hyperprior

device = 'cuda' if torch.cuda.is_available() else 'cpu'
net = bmshj2018_hyperprior(quality=2, pretrained=True).eval().to(device)
net.update(force=True)  # update the model CDFs parameters.

print(f'Parameters: {sum(p.numel() for p in net.parameters())}')
print(f'Entropy bottleneck(s) parameters: {sum(p.numel() for p in net.aux_parameters())}')

img = Image.open('../data/stmalo_fracape.png').convert('RGB')
x = transforms.ToTensor()(img).unsqueeze(0)
x = x.to(device)
with torch.no_grad():
    #output of training
    out_net = net(x)
    out_net['x_hat'].clamp_(0, 1)
    bits_per_image = compute_theoretical_bits(out_net)

    # output of real compression and decompression
    compressed = net.compress(x)
    compressed_bits_per_image = compute_actual_bits(compressed)
    decompressed = net.decompress(compressed["strings"], compressed["shape"])
    # decompressed['x_hat'].clamp_(0, 1) # no need to clamp decompressed results?

    diff = (out_net["x_hat"] - decompressed["x_hat"]).abs()
    diff_in_bits = (bits_per_image - compressed_bits_per_image).abs()
    print("max difference={}, min difference={}".format(diff.max(), diff.min()))
    print("diff in bits={}, ratio (compressed/training)={}%".format(diff_in_bits, torch.div(compressed_bits_per_image, bits_per_image)))

    isCloseReconstruction = torch.allclose(out_net["x_hat"], decompressed["x_hat"], atol=1e-06, rtol=0)
    isCloseBits = torch.allclose(bits_per_image, compressed_bits_per_image, atol=0, rtol=1e-2)
    assert isCloseReconstruction, "The output of decompressed image is not equal to image"
    assert isCloseBits, "The number of compressed bits is not equal to the number of bits computed in training phase"
jbegaint commented 3 years ago

Hi Navid, thanks for the comments!

Yes there should be a slight difference between forward and compress/decompress.

Like you noticed, we do clip to (0, 1) after the decompress functions (actually I found a missing clamp in the factorized prior model, I'll update the code). And we do not clip in forward (since this is mostly used for training).

Regarding the bpp, it should be close but usually slightly higher for the "actual" bitrates. First there's some inefficiency from the entropy coder due to implementation choices. Also we follow Ballé et al. and use a "scale table" to provide the closest probability distribution for each latent element (since it would be super slow to compute a pdf/cdf for each element), which can also cause a slight difference.

Thanks for the comments! I'll try to update the docs/ add more examples to highlight this.

navid-mahmoudian commented 3 years ago

Thank you Jean and wish you a nice and pleasant end of the year.

navid-mahmoudian commented 3 years ago

Hello again, In the above code, I came across a new problem that is interesting to discuss the solution. Before, I was running the code on CPU and it was working properly, but now when I tried to see the code on GPU I got the following error:

Traceback (most recent call last): File "test_2D_real_bit_vs_estimation.py", line 38, in decompressed = net.decompress(compressed["strings"], compressed["shape"]) File "/lib/python3.7/site-packages/compressai/models/priors.py", line 330, in decompress x_hat = self.g_s(yhat).clamp(0, 1) File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 929, in forward output_padding, self.groups, self.dilation) RuntimeError: Tensor for argument #1 'input' is on CPU, Tensor for argument #2 'output' is on CPU, but expected them to be on GPU (while checking arguments for slow_conv_transpose2d_out_cuda)

This shows that the convolution weights are on GPU, but the output of the decompress function is on the CPU. I think this is because the inputs of the decompress method are a string and a shape parameter which are on the CPU, and therefore, it continues to work on the CPU. However, inside the decompress function we should check where the network parameters are (CPU vs GPU) and convert everything to the place where the parameters of the network reside.

navid-mahmoudian commented 3 years ago

Hello again, I dug into the code to solve the above issue. Basically, we have to check several parts of "/compressai/entropy_models/entropy_models.py" to create the tensors in a proper device. I fixed the bug and asked for a pull request in #21.

jbegaint commented 3 years ago

Thanks a lot for the report and the PR Navid!

Freed-Wu commented 2 years ago

However, if I clamp the output of the forward function then there is no difference in reconstruction results

For bmshj2018-factorized/bmshj2018-hyperprior, there is no difference. For mbt2018, there is difference. I think it is because autoregression?

from compressai.zoo import image_models
from torchvision import transforms
import os
from PIL import Image
from matplotlib import pyplot as plt
import torch
import logging
logger = logging.getLogger(__name__)

img1 = transforms.ToTensor()(Image.open(os.path.expanduser('~/Desktop/kodak_path/kodim02.png')))
img2 = transforms.ToTensor()(Image.open(os.path.expanduser('~/Desktop/kodak_path/kodim01.png')))
imgs = torch.stack([img1, img2])
model = image_models['mbt2018'](8, pretrained=True)
model.eval()
# model.update()
codes = model.compress(imgs)
x_hats = model.decompress(**codes)['x_hat'].detach().permute(0, 2, 3, 1)
labels = model(imgs)['x_hat'].detach().permute(0, 2, 3, 1).clamp(0, 1)
(x_hats == labels).all()