Justin-Tan / generative-compression

TensorFlow Implementation of Generative Adversarial Networks for Extreme Learned Image Compression
MIT License
511 stars 108 forks source link

upsample noise to concatenate with quantied representation #13

Open Jillian2017 opened 6 years ago

Jillian2017 commented 6 years ago

Hi, while reading the paper, it proposed an optionally choice to concatenate the representaion with noise v. In your code, a dcgan_generator is used to generate noise v, while the output size is [,32,64,32]. If the dataset is cityscapes, the input image is resized to 5121024, the feature maps' size is 3264C( C=8 ),so I think the concatenated feature maps-z's size is [,32,64,32+8]=[,32,64,40], questions are: 1> why generate the noise by dcgan network? 2> if the input size changes, for example I train the ADE20k with the input size 512*512, then the noise'size cannot be concatenated to the quantized representation, so we need to change the dcgan network? 3> adding noise will increase the bpp by a large margin, as the its output size is too big. These are my questions, looking forward to your reply. Thanks for sharing your code. generate the noise

Justin-Tan commented 6 years ago

Hello,

  1. The DCGAN framework is an arbitrary choice, I have used it before in WGAN with reasonable results, but you may have success in trying other frameworks, e.g. in https://github.com/igul222/improved_wgan_training/blob/master/gan_64x64.py. The authors have not yet specified the details of noise upsampling, so this is my best guess, they may have a more principled method. If you come across any new insights when reading the paper I would definitely like to know more.
  2. The noise is just concatenated to the last channel dimension so this is independent of the [height, width] dimensions of the input image.
  3. I think we can consider the noise vector part of the generator architecture and just save the quantized representation to disk, so the entropy of the quantized vector is still an upper bound on the bpp.
Jillian2017 commented 6 years ago

Very appropriate your timely and generous reply. For answer1: I also have no idea about the nosie mentioned in the paper, thanks for sharing your idea here. Have you ever trained these networks on ADK20k, and how are your results? I am doing this part of work, but the results are quite bad.

Justin-Tan commented 6 years ago

To produce the images in their paper, the authors train ADE20k for 50 epochs using the semantic label maps as additional information. I haven't tried training on ADE20k yet, because I don't have the compute power to spare right now but I will update the readme when I do. One possible explanation for the disparity in image quality is that the authors incorporate a VGG loss (Section 5.2 in the paper) based on intermediate activations in the VGG network, which I haven't implemented yet.

Of course if the results are bad for general images despite appearing to work well on street scenes, it is highly possible there is a mistake in the implementation somewhere.

Jillian2017 commented 6 years ago

Okay, I got. I will train ADE20k again without semantic map and with VGG loss. Thanks a lot.

chenxianghu commented 6 years ago

@Justin-Tan We expect your compress effect on ADE20k dataset, can you try it? @Jillian2017 and me try it, but the effect is not so good.Maybe there need some code modification for ADE20k. Thank you very much!

Justin-Tan commented 6 years ago

Yes, I think implementing the VGG perceptual loss may help. Unfortunately I am quite busy at the moment, but it is top of the to-do list.