Closed navid-mahmoudian closed 3 years ago
Hi Navid, thanks for the comments!
Yes there should be a slight difference between forward and compress/decompress.
Like you noticed, we do clip to (0, 1) after the decompress functions (actually I found a missing clamp in the factorized prior model, I'll update the code). And we do not clip in forward (since this is mostly used for training).
Regarding the bpp, it should be close but usually slightly higher for the "actual" bitrates. First there's some inefficiency from the entropy coder due to implementation choices. Also we follow Ballé et al. and use a "scale table" to provide the closest probability distribution for each latent element (since it would be super slow to compute a pdf/cdf for each element), which can also cause a slight difference.
Thanks for the comments! I'll try to update the docs/ add more examples to highlight this.
Thank you Jean and wish you a nice and pleasant end of the year.
Hello again, In the above code, I came across a new problem that is interesting to discuss the solution. Before, I was running the code on CPU and it was working properly, but now when I tried to see the code on GPU I got the following error:
Traceback (most recent call last): File "test_2D_real_bit_vs_estimation.py", line 38, in
decompressed = net.decompress(compressed["strings"], compressed["shape"]) File "/lib/python3.7/site-packages/compressai/models/priors.py", line 330, in decompress x_hat = self.g_s(yhat).clamp(0, 1) File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 929, in forward output_padding, self.groups, self.dilation) RuntimeError: Tensor for argument #1 'input' is on CPU, Tensor for argument #2 'output' is on CPU, but expected them to be on GPU (while checking arguments for slow_conv_transpose2d_out_cuda)
This shows that the convolution weights are on GPU, but the output of the decompress function is on the CPU. I think this is because the inputs of the decompress method are a string and a shape parameter which are on the CPU, and therefore, it continues to work on the CPU. However, inside the decompress function we should check where the network parameters are (CPU vs GPU) and convert everything to the place where the parameters of the network reside.
Hello again, I dug into the code to solve the above issue. Basically, we have to check several parts of "/compressai/entropy_models/entropy_models.py" to create the tensors in a proper device. I fixed the bug and asked for a pull request in #21.
Thanks a lot for the report and the PR Navid!
However, if I clamp the output of the forward function then there is no difference in reconstruction results
For bmshj2018-factorized/bmshj2018-hyperprior, there is no difference. For mbt2018, there is difference. I think it is because autoregression?
from compressai.zoo import image_models
from torchvision import transforms
import os
from PIL import Image
from matplotlib import pyplot as plt
import torch
import logging
logger = logging.getLogger(__name__)
img1 = transforms.ToTensor()(Image.open(os.path.expanduser('~/Desktop/kodak_path/kodim02.png')))
img2 = transforms.ToTensor()(Image.open(os.path.expanduser('~/Desktop/kodak_path/kodim01.png')))
imgs = torch.stack([img1, img2])
model = image_models['mbt2018'](8, pretrained=True)
model.eval()
# model.update()
codes = model.compress(imgs)
x_hats = model.decompress(**codes)['x_hat'].detach().permute(0, 2, 3, 1)
labels = model(imgs)['x_hat'].detach().permute(0, 2, 3, 1).clamp(0, 1)
(x_hats == labels).all()
Hello,
I have a question for a better understanding of your very useful and nice library, and it would be great if you could add a similar example to your example folder for others.
I did a simple test and noticed there is a difference between actual reconstruction results (obtained by
compress
/decompress
functions) and the one obtained by theforward
function. The difference is in both the reconstructed results and the estimated bits. However, if Iclamp
the output of theforward
function then there is no difference in reconstruction results, but still, there is a difference between theoretical bit rates and actual bitrates. So, I have two questions in that regards: 1- Does it mean that thecompress
anddecompress
function somehow clamp the results? i.e., there is no need to clamp the output by ourselves? 2- Does the difference between theoretical and actual bitrates come from the practical implementation of the encoder that imposes some extra bits for tasks such as the "end of file" symbol, discretization of everything into bits, etc.)Here is a simple code to test: