Justin-Tan / generative-compression

TensorFlow Implementation of Generative Adversarial Networks for Extreme Learned Image Compression
MIT License
511 stars 108 forks source link

Relationship between channels C and bbp #5

Open chenxianghu opened 6 years ago

chenxianghu commented 6 years ago

you said C=8 channels - (compression to 0.072 bbp) in Results section I don't know the relationship between C and bbp, can you explain to me?

We compare this image compression effect with BPG png image -> decode -> our encode -> our quantize Result: quantized representation

the bbp comparation objects are quantized representation and encoded BPG, same bbp which decoded image quality is better or same decoded image quality which bbp is lower, right?

Justin-Tan commented 6 years ago

If you read the original paper (https://arxiv.org/pdf/1804.02958.pdf), the upper bound on the bitrate is given by Eq. 5. Here dim (w_hat) is given by the number of channels C.

chenxianghu commented 6 years ago

I want to test the performance of this Model, so i modify single_plot function like this and then run your compress.py, there are two steps:

  1. original image -> quantized representation ----spend 623ms
  2. quantized representation -> reconstructed image --- spend 115 ms is my test method right? and what's the performance measured data at your size?

If i want to realize End-to-End image compression, i think i should save the quantized representation as file at sender side , and recover file to reconstructed image at receiver side, sender and receiver both should load the well-trained model, is my thinking right?

def single_plot(epoch, global_step, sess, model, handle, name, config, single_compress=False):

    real = model.example
    gen = model.reconstruction
    zz = model.z
    start = time.time()
    # Generate images from noise, using the generator network.
    #r, g = sess.run([real, gen], feed_dict={model.training_phase:True, model.handle: handle})
    r,**z** = sess.run([real,zz], feed_dict={model.training_phase: True, **model.handle: handle**})
    print("encoder + quantizer spend  Time: {:.3f} s".format(time.time() - start))
    print('z shape:', z.shape)
    #print('z result:',z)
    start = time.time()
    **g** = sess.run(gen, feed_dict={model.training_phase: True, **model.z: z**})
    print("generator spend  Time: {:.3f} s".format(time.time() - start))
chenxianghu commented 6 years ago

I test your pre-trained mode, the test data: 1.original image -> quantized representation ----about 1.5s 2.quantized representation -> reconstructed image --- about 1s

the test result is different from mine, because my input image size is 256x256

I also test your model effect using different images: 1) image from leftImg8bit/train ----effect is good 2) image from leftImg8bit/test ----effect is worse than image from train dir 3) image from internet ---effect is terrible

can this model can be used for compressing arbitrary images?

Justin-Tan commented 6 years ago

If you want to compress arbitrary images, train on a large dataset of natural images like ImageNet or the ADE20k dataset. The pretrained model was only trained on the Cityscapes dataset, which is a collection of street scenes from Germany and Switzerland.

The distribution of images in ImageNet/ADE20k will be more diverse and so the model will probably take longer to converge. To train on ADE20k download the dataset from the link in the readme and pass the --ds ADE20k flag:

python3 train.py -ds ADE20k <args>

To train on ImageNet you will have to write your own data loader. I think it will work with the default setup, but you will have to check this.

chenxianghu commented 6 years ago

First I train my model using cityscapes 60 epochs, and then continue to train this model using ADE20k 10 epochs, i find the compress effect become wrose.Maybe the model doesn't converge. I think it is hard to compress arbitrary image using one model.

Justin-Tan commented 6 years ago

Don't train using Cityscapes initially, just train using ADE20k. Make sure you pull the latest version, I fixed a couple of errors in the code.

It should take a long time for the model to converge using ADE20k, the authors trained it for 50 epochs originally to get the results in the paper.

chenxianghu commented 6 years ago

OK, this morning i also read the paper, i find i should train ADE20k from ZERO, but one error occured it seems that the shape of self.w_hat and Gv didn't match, so i disable sampling noise by adding a condition like below, now it is working well and under training. Thank you!

        if config.sample_noise is True and dataset != 'ADE20k':
            print('Sampling noise...')
            # noise_prior = tf.contrib.distributions.Uniform(-1., 1.)
            # self.noise_sample = noise_prior.sample([tf.shape(self.example)[0], config.noise_dim])
            noise_prior = tf.contrib.distributions.MultivariateNormalDiag(loc=tf.zeros([config.noise_dim]), scale_diag=tf.ones([config.noise_dim]))
            v = noise_prior.sample(tf.shape(self.example)[0])
            Gv = Network.dcgan_generator(v, config, self.training_phase, C=config.channel_bottleneck, upsample_dim=config.upsample_dim)
            print('Gv:', Gv);
            self.z = tf.concat([self.w_hat, Gv], axis=-1)
        else:
            self.z = self.w_hat
wensihan commented 6 years ago

I modify the network as you do, but there still have problem with Incompatible shapes: [1,3,688,512] vs. [1,3,683,512] in Line 127 (model.py): distortion_penalty = config.lambda_X * tf.losses.mean_squared_error(self.example, self.reconstruction). Do you have any suggest?

wensihan commented 6 years ago

@chenxianghu

chenxianghu commented 6 years ago

the shape of self.example and self.reconstruction should be the same, for cityscapes dataset it should be [1, 512, 1024, 3], which means [batch_size, height, width, channels]

wensihan commented 6 years ago

I use the dataset of ADE20K which only rescale the width to 512px, is there any change to others parameters except for disabling the sample noise? @chenxianghu

chenxianghu commented 6 years ago

I modify many places: 1)make my own h5 file, only use 200x200 to 975x975 jpeg images in ADE20K(as the same in the paper) 2)resize image to [512,512], not padding or cropping 3)use tf.image.decode_jpeg, not tf.image.decode_png 4)modify Network.dcgan_generator for adapting to [512,512]

I think it is better that you learn some basic knowledge first and then try to train your own model!

wensihan commented 6 years ago

@chenxianghu First, thank you very much for your reply, then I still have a question: 200x200 to 975x975 means the images larger or lower than this will be excluded? And then the dataset contains less than 20210 training images, right?

chenxianghu commented 6 years ago

yes, this is the description of the original paper:

Data sets: We train the proposed method on two popular data sets that come with hand-annotated semantic label maps, namely Cityscapes [42] and ADE20k [43]. Both of these data sets were previously used with GANs [12, 33], hence we know that GANs can model their distribution|at least to a certain extent. Cityscapes contains 2975 training and 500 validation images of dimension 2048 1024px, which we resampled to 1024  512px for our experiments. The training and validation images are annotated with 34 and 19 classes, respectively. From the ADE20k data set we use the SceneParse150 subset with 20 210 training and 2000 validation images of a wide variety of sizes (200200px to 975975px), each annotated with 150 classes. During training, the ADE20k images are rescaled such that the width is 512px.

wensihan commented 6 years ago

I know this, I just puzzles that does this sentence ( 20 210 training and 2000 validation images of a wide variety of sizes (200�200px to 975�975px)) means the 20210 training images' size vary from 200x200 to 975x975?

chenxianghu commented 6 years ago

i checked some jpeg image's size are not in the range from 200x200 to 975x975 such as ADE20K\images\training\h\hacienda\ADE_train_00008829.jpg is 1024x768

wensihan commented 6 years ago

Yes, so I am puzzled... Okay, I know this, the dataset of training is smaller than 20210. Thank you~

Jillian2017 commented 6 years ago

@chenxianghu hi, do you add nosie while training the ADE20K, I came across an error result from the mismatch of the noise's dimension and the encoder network's output. So I wonder if we have to change the method of generating noise. What's more, whether your result is acceptable based on the ADE20K dataset, mine is quite poor and the generator is not convergent after almost 40 epoches.

chenxianghu commented 6 years ago

@Jillian2017 I add nosie while training the ADE20K dataset by modifying Network.dcgan_generator function to adapt 512x512, my generated images quality is also poor after 40 epoches, some generated images even have a strange colorized plaque which doesn't exist in the original images. Do you have this case , I don't know why.

zhiqiang-zhu commented 6 years ago

@chenxianghu Hi, can you leave a email ? I would like to ask you some questions.