Closed danishnazir closed 1 year ago
Random guess: the input images are in the incorrect range, e.g. [0, 1]
when [0, 255]
is expected. Thus, the input image is almost completely black, meaning that it is very easy to compress. Perhaps check the dataset dataloaders.
Thank you for your response. I am using all of the stock training/evaluation code available at compress ai. It is given below.
For instance in examples/train.py
.
train_transforms = transforms.Compose(
[transforms.RandomCrop(args.patch_size), transforms.ToTensor()]
)
test_transforms = transforms.Compose(
[transforms.CenterCrop(args.patch_size), transforms.ToTensor()]
)
Here transforms.ToTensor() will transform PIL Image of [0,255]
to [0,1]
.
Similarly in eval_model/__main.py__
def read_image(filepath: str) -> torch.Tensor:
assert filepath.is_file()
img = Image.open(filepath).convert("RGB")
return transforms.ToTensor()(img)
To my surprise, I downloaded orignal pretrained model for hyper priors and ran the evaluation code, It gave me orignal numbers (provided by you in the compressai). I am unsure what exactly could be the problem. i also checked the reconstructed images x['hat']
and they look normal.
What test should i run to ensure everything in the pipeline is correct? What do you suggest?
Some other possibilities:
x
is given to the decoder-side, and that ends up influencing x_hat
.PSNR(x, x_hat)
but instead between PSNR(x, Q(x))
or PSNR(x_hat, Q(x_hat))
or something like that.compress
is in a different format -- ensure all the keys and lists are in the exact same way as the reference model implementations from CompressAI.What are the changes you made to the script/repository? Does training work without those changes?
Thank you for your response.
I rechecked my pipeline and I agree with you first point , x
was indeed passed to the decoder side (un-knowingly), which likely caused this issue. Although it was not intended and out of scope of compressAI but could you maybe explain why exactly we can not pass x
to the decoder (I looked at some literature but couldnt find any satisfiable answer). In my opinion, passing x
to the decoder should only enhance the image reconstruction task without influencing much on the compression performance (bitrate) itself. Is it because of the tradeoff between rate and distortion? or am i missing something very obvious.
A realistic decoder should only receive compressed data. The amount of data it receives is proportional to the rate. If the decoder receives x, then it should (i) simply output x, since that is the best quality reconstruction, and (ii) include the size of uncompressed x in the rate calculation.
Using x to enhance the image reconstruction task should only be done at the encoder side. The enhancement will cost some amount of bits, which is a tradeoff.
An encoder may contain part (or all) of the decoder, if desired.
Thank you for your answer, Just one quick follow up question. What exactly does this mean?
(ii) include the size of uncompressed
x
in the rate calculation.
Arent we already using uncompressed x
in the rate calculation. For instance the code below, bpp is normalized by x
. What should we add to calculate the rate if hypothetically we give x or feature maps of x
to decoder.
num_pixels = x.size(0) * x.size(2) * x.size(3)
bpp = sum(len(s[0]) for s in out_enc["strings"]) * 8.0 / num_pixels
That relies upon x.size, which can be considered a tuple of approximately 6 bytes. If that is the only usage of x within the decoder, the problem may be something else. Check what s[0] contains. It expects a byte string; is it a list/dict instead?
Bug
Hi all, Thank you for a great compression library. I am using Hyperprior-architecture
(Baelle ICLR 2018)
but I modified the encoder and decoder (added few layers etc). I trained this modified architecture for50 epochs on Vimeo90K triplet dataset
withQuality = 2, batch_size =32
. During training, the test loss was almost0
, therefore I stopped the training. However, when I evaluated the model (using the scripts available in compressAI) , the results on Kodak dataset was very unrealistic. The json is given below.Can you please advise if this is a bug in the evaluation code or what else could be the reason for these crazy numbers? I believe i am doing something very stupid :( .
Thanks