Justin-Tan / high-fidelity-generative-compression

Pytorch implementation of High-Fidelity Generative Image Compression + Routines for neural image compression
Apache License 2.0
411 stars 77 forks source link

How to reproduce TensorFlow results? #21

Closed Artem531 closed 3 years ago

Artem531 commented 3 years ago

Hi!

Thanks for your implementation of HiFiC! I'm trying to understand why the results of this Pytorch implementation are so much different from the TensorFlow implementation. Are there any differences in implementations? How to reproduce TensorFlow results? There are some examples below.

Thanks in advance for your reply

Original

kodim01 (1)

(low) Paper

kodim01

(low) TensorFlow

kodim01-0_258bpp

(low) Pytorch (Your pre-trained GAN model)

kodim01_RECON_0_205bpp

(low) Pytorch (What I get when train Pytorch model with default settings and batch_size=12 for GAN)

kodi01_RECON_0_214bpp

Justin-Tan commented 3 years ago

Thanks for bringing this up. The main difference would be that I trained for around 72 hours on a single GPU, which totals around 200k optimization steps, depending on the type of GPU. In the HIFIC paper, Appendix A.6, they state they trained for 2M optimization steps, 10x longer. I guess if you were willing to train for longer you would be able to obtain better results. To the best of my knowledge, all hyperparameters and the architecture is identical to the HIFIC paper and the Hyperprior implementation in Balle et. al. (2018) - with the one exception of the 'synthesis' transformations used in the Hyperprior and Generator.

As you may see in the definition of the synthesis and hyper-synthesis transforms here, instead of performing the standard cross-correlation during upsampling, they perform a true convolution. You can find the implementation details here - they use a custom Convolution class. I can't find any explanation why this is done - the docs in the TF Compression library claim this results in a speed increase, but I'm not sure how it affects the final result. If you find a good explanation of the point, I would be interested in hearing it.

Also PRs to implement this are very welcome.

Artem531 commented 3 years ago

Ok, thank you for your reply! I will try to train the model with half-precision with my custom multi-GPU mode and check your guess.

I am closing this issue.

mmSir commented 3 years ago

@Artem531 Hi, Artem531! Have u tried to train more steps (for instance, 500k steps or more) yet? Could u please share your training results? I'd like to know whether this pytorch implementation could archive exactly the same result in HiFiC paper. Here is my test result on CLIC2020 (12 images were removed due to out of memory during compress on my GPU) image

Thank you!

Artem531 commented 3 years ago

@mmSir Hi, mmSir!

I tried to use 16-bit precision with apex (amp_level='O2') and find out that learning with these settings is not stable after ~25 epochs (300k steps (?)). So I could not reproduce the results :(

If you have multiple GPUs, you can try to use my Lightning PyTorch version of this repository for multi-GPU learning without 16-bit precision and apex (https://github.com/Artem531/Lightning_HiFiC). But be careful I did not have much time to find all possible issues (tested on two 1080ti GPUs).

mmSir commented 3 years ago

All right. Thanks for your reply! :)