Unexpected results on Kodak dataset

InterDigitalInc / CompressAI

A PyTorch library and evaluation platform for end-to-end compression research

https://interdigitalinc.github.io/CompressAI/

BSD 3-Clause Clear License

1.19k stars 232 forks source link

Unexpected results on Kodak dataset #190

Closed danishnazir closed 1 year ago

danishnazir commented 1 year ago

Bug

Hi all, Thank you for a great compression library. I am using Hyperprior-architecture (Baelle ICLR 2018) but I modified the encoder and decoder (added few layers etc). I trained this modified architecture for 50 epochs on Vimeo90K triplet dataset with Quality = 2, batch_size =32. During training, the test loss was almost 0, therefore I stopped the training. However, when I evaluated the model (using the scripts available in compressAI) , the results on Kodak dataset was very unrealistic. The json is given below.

{
  "name": "bmshj2018-modified-hyperprior-mse",
  "description": "Inference (ans)",
  "results": {
    "psnr": [
      90.76645755767822
    ],
    "ms-ssim": [
      0.9999999925494194
    ],
    "bpp": [
      0.0003255208333333332
    ],
    "encoding_time": [
      0.11552197734514873
    ],
    "decoding_time": [
      0.03114013870557149
    ]
  }

Can you please advise if this is a bug in the evaluation code or what else could be the reason for these crazy numbers? I believe i am doing something very stupid :( .

Thanks

YodaEmbedding commented 1 year ago

Random guess: the input images are in the incorrect range, e.g. [0, 1] when [0, 255] is expected. Thus, the input image is almost completely black, meaning that it is very easy to compress. Perhaps check the dataset dataloaders.

danishnazir commented 1 year ago

Thank you for your response. I am using all of the stock training/evaluation code available at compress ai. It is given below. For instance in examples/train.py.

    train_transforms = transforms.Compose(
        [transforms.RandomCrop(args.patch_size), transforms.ToTensor()]
    )

    test_transforms = transforms.Compose(
        [transforms.CenterCrop(args.patch_size), transforms.ToTensor()]
    )

Here transforms.ToTensor() will transform PIL Image of [0,255] to [0,1]. Similarly in eval_model/__main.py__

def read_image(filepath: str) -> torch.Tensor:
    assert filepath.is_file()
    img = Image.open(filepath).convert("RGB")
    return transforms.ToTensor()(img)

To my surprise, I downloaded orignal pretrained model for hyper priors and ran the evaluation code, It gave me orignal numbers (provided by you in the compressai). I am unsure what exactly could be the problem. i also checked the reconstructed images x['hat'] and they look normal.

What test should i run to ensure everything in the pipeline is correct? What do you suggest?

YodaEmbedding commented 1 year ago

Some other possibilities:

At some point, x is given to the decoder-side, and that ends up influencing x_hat.
PSNR is not computed between the PSNR(x, x_hat) but instead between PSNR(x, Q(x)) or PSNR(x_hat, Q(x_hat)) or something like that.
The log likelihood sum (i.e. rate) is not being computed properly since the dict datastructure outputted by compress is in a different format -- ensure all the keys and lists are in the exact same way as the reference model implementations from CompressAI.

What are the changes you made to the script/repository? Does training work without those changes?

danishnazir commented 1 year ago

Thank you for your response. I rechecked my pipeline and I agree with you first point , x was indeed passed to the decoder side (un-knowingly), which likely caused this issue. Although it was not intended and out of scope of compressAI but could you maybe explain why exactly we can not pass x to the decoder (I looked at some literature but couldnt find any satisfiable answer). In my opinion, passing x to the decoder should only enhance the image reconstruction task without influencing much on the compression performance (bitrate) itself. Is it because of the tradeoff between rate and distortion? or am i missing something very obvious.

YodaEmbedding commented 1 year ago

A realistic decoder should only receive compressed data. The amount of data it receives is proportional to the rate. If the decoder receives x, then it should (i) simply output x, since that is the best quality reconstruction, and (ii) include the size of uncompressed x in the rate calculation.

Using x to enhance the image reconstruction task should only be done at the encoder side. The enhancement will cost some amount of bits, which is a tradeoff.

An encoder may contain part (or all) of the decoder, if desired.

danishnazir commented 1 year ago

Thank you for your answer, Just one quick follow up question. What exactly does this mean?

(ii) include the size of uncompressed x in the rate calculation.

Arent we already using uncompressed x in the rate calculation. For instance the code below, bpp is normalized by x . What should we add to calculate the rate if hypothetically we give x or feature maps of x to decoder.

    num_pixels = x.size(0) * x.size(2) * x.size(3)
    bpp = sum(len(s[0]) for s in out_enc["strings"]) * 8.0 / num_pixels

YodaEmbedding commented 1 year ago

That relies upon x.size, which can be considered a tuple of approximately 6 bytes. If that is the only usage of x within the decoder, the problem may be something else. Check what s[0] contains. It expects a byte string; is it a list/dict instead?