NJUVISION / NIC

End-to-End Learnt Image Compression Codec
49 stars 7 forks source link

Additional pretrained weights for the highest PSNR models #10

Closed rezafuru closed 1 year ago

rezafuru commented 1 year ago

Hello,

My apologies for bothering you about weights again. I was able to successfully replicate your results with your provided code and weights, which is awesome! However, I noticed that the weights for the lowest distortion models are missing, i.e. "mse12800", "mse25600" ("mse200" also seems to be missing, but this one isn't too important to my experiments).

Is there any chance that you still have "mse12800" and "mse25600" uploaded somewhere? I would extremely appreciate it

BR

tongxyh commented 1 year ago

Hello! I’m glad to hear that you were able to replicate the results with the provided code and weights.

Since this project has been selected as part of the reference software for IEEE1857.11. Further updates and more weights can be found here (https://gitlab.com/NIC_software/NIC).

Note that the code and weights in the new site have been frequently updated (for the moment it is already NIC-v0.5). The results produced by the newer version may not perfectly align with the results presented in my paper.

Hope this helps!

rezafuru commented 1 year ago

Perfect, many thanks! 25600 is present in the new repository

tzayuan commented 1 year ago

Hi @rezafuru

May I ask you that your environment for replicating the results? I tried PyTorch 1.7.0 and PyTorch 1.5.1 with official provided pretrained model, and the results looks like have something wrong.

Original photo and reconstructed photo: image

Thanks!

rezafuru commented 1 year ago

Hi @rezafuru

May I ask you that your environment for replicating the results? I tried PyTorch 1.7.0 and PyTorch 1.5.1 with official provided pretrained model, and the results looks like have something wrong.

Original photo and reconstructed photo: image

Thanks!

Hi @tzayuan,

PyTorch 1.7.0 should be fine, but I'm not sure which version I used from the top of my head. I can check later when I'm at home.

I also had to tinker around a bit to get it to work, although I only measured PSNR for my experiments and the predictive loss of passing an ImageNet sample to the compression model before passing it to a classification model.

Looking at your image, it looks like something may have gone wrong during entropy decoding. Can you try to use the network without using the entropy coder and see whether you at least get a usable image?

BR

tzayuan commented 1 year ago

Hi @rezafuru

May I ask you that your environment for replicating the results? I tried PyTorch 1.7.0 and PyTorch 1.5.1 with official provided pretrained model, and the results looks like have something wrong.

Original photo and reconstructed photo: image

Thanks!

Hi @tzayuan,

PyTorch 1.7.0 should be fine, but I'm not sure which version I used from the top of my head. I can check later when I'm at home.

I also had to tinker around a bit to get it to work, although I only measured PSNR for my experiments and the predictive loss of passing an ImageNet sample to the compression model before passing it to a classification model.

Looking at your image, it looks like something may have gone wrong during entropy decoding. Can you try to use the network without using the entropy coder and see whether you at least get a usable image?

BR

Hi @rezafuru

I used my own data set to run the training script by using PyTorch 1.7.0. Judging from the values of PSNR and MS-SSIM in the training process, the training of the model is normal. However, during inference process using inference script, after encoding and decoding, the image visualization after decoding is shown in the figure above, still seems to have some problems. I have no idea now.

BR

rezafuru commented 1 year ago

This does increase my suspicion that the problem is with the entropy coder. Try to comment out entropy (de)coding

tzayuan commented 1 year ago

This does increase my suspicion that the problem is with the entropy coder. Try to comment out entropy (de)coding

If you have any update? I'm thinking that when I remove entropy coding module, if I need retrain the whole model? Or the entropy module can be removed without retrain? My understand is that the context information as the input of entropy module is a part of network.

rezafuru commented 1 year ago

This does increase my suspicion that the problem is with the entropy coder. Try to comment out entropy (de)coding

If you have any update? I'm thinking that when I remove entropy coding module, if I need retrain the whole model? Or the entropy module can be removed without retrain? My understand is that the context information as the input of entropy module is a part of network.

Update regarding what?

The entropy coding itself is not learned (only the entropy model). During training, you don't actually apply entropy coding to compute the rate. Instead, you do an estimation on the entropy of your latent.

That the issue might be with the entropy coder would be consistent with your observation that the values look fine during training, but the model does not work during inference.

Simply comment out all passes to AC, and directly pass the output of the analysis network (neural encoder) to the synthesis network (neural decoder). You do not need to retrain anything.

tongxyh commented 1 year ago

Hi, @tzayuan. This should be caused by the numerical instability of floating-point entropy estimation. And @rezafuru have provided a perfect trick (comment out the real entropy coding) if you just want to have a quick test.

If you still hope to actually encode and decode the bits, you may try to perform your test on cpu only (set all '.cuda()' to '.cpu()'). But still, the reconstruction matching can not be 100% guaranteed.

In order to fully solve the numerical instability issue, the neural network (at least the network for entropy estimation) needs to be quantized to fixed-point, which has not been currently implemented in the code yet.

Hope this information help.

Tong Chen

rezafuru commented 1 year ago

@tzayuan you can read more about decoding instability due to numerical precision in Integer Networks for Data Compression with Latent-Variable Models by Ballé et al.

tzayuan commented 1 year ago

Hi @tongxyh I have used your newest code from Gitlab(IEEE 1857.11 Standard). I train the model for one epoch (about 180,000 images) and got a model, and then on the same hardware platform, I run the encoder and decode script for inference. I found that the reconstructed image generated by encoder seems well(in facts, it still skip the AE module), the reconstructed image generated by decoder seems better than before, could you give me further suggestions? Maybe the training epoch is so small? Or I need further thinking about Q and AE module?

Original PNG: YourTestPic3 Reconstructed PNG without AE module: YourTestPic3_Recon Reconstructed PNG with AE module: OutputRecon3

BR, Zeyuan