AlexFuster / Neural_network_image_compression

Convolutional neural network (autoencoder) based lossy image compression + encryption in tensorflow
4 stars 1 forks source link

About Entropy Estimate #1

Open hello-bin opened 5 years ago

hello-bin commented 5 years ago

Hi,我也在做基于深度学习的图像压缩毕业设计。看到了你实现的熵估计部分。不是太理解。你是否测试过你目前模型的图像压缩性能。如果方便可以私下好好交流一下

AlexFuster commented 5 years ago

Hi friend,

As far as I can understand using Google translate (which works surprisingly good for Chinese), you have a question about how I estimate the entropy.

The well known formula for entropy is sum(P(i)*-log2(P(i)). The problem with this approach is that it is discrete and non differentiable, so you can't use it as a loss function in a neural network. The idea I had was to train a second neural network to estimate in a continuous (and differentiable) manner the entropy of an image. I use the predictions of this net as a part of the loss function of the compression network. This technique seems to work fine without the need of solving yourself the mathematically complex problem of computing an accurate continuous relaxation of the entropy

Tell me if you have any other questions (please in English if possible)

AlexFuster commented 5 years ago

About the results, they surpass JPEG but not JPEG2000 (at least not for now)

hello-bin commented 5 years ago

Hi,AlexFuster,

Thanks for your reply. As you just that the well known formula for entropy is sum(P(i)*-log2(P(i)) is discrete and non differentiable. In fact, it depends on the way to compute it. If you use histogram statistics to compute it, it must be non differentiable. But if we use the distribute parament of code to estimate the probability and estimate the rate of code, it is differentiable. For this, I can recommend you some paper about this. Also the result is paper even overperform BPG encoder in PSNR. The paper lists as following: 1)https://arxiv.org/abs/1809.02736?context=cs.CV if you have any question, we can have a discuss. The paper above may provide a better way for image compression.

Best wishes! Leon 2019-05-30

------------------ 原始邮件 ------------------ 发件人: "AlexFuster"notifications@github.com; 发送时间: 2019年5月30日(星期四) 凌晨3:44 收件人: "AlexFuster/Neural_network_image_compression"Neural_network_image_compression@noreply.github.com; 抄送: "Leon.niu"1317484658@qq.com;"Author"author@noreply.github.com; 主题: Re: [AlexFuster/Neural_network_image_compression] About EntropyEstimate (#1)

Hi friend,

As far as I can understand using Google translate (which works surprisingly good for Chinese), you have a question about how I estimate the entropy.

The well known formula for entropy is sum(P(i)*-log2(P(i)). The problem with this approach is that it is discrete and non differentiable, so you can't use it as a loss function in a neural network. The idea I had was to train a second neural network to estimate in a continuous (and differentiable) manner the entropy of an image. I use the predictions of this net as a part of the loss function of the compression network. This technique seems to work fine without the need of solving yourself the mathematically complex problem of computing an accurate continuous relaxation of the entropy

Tell me if you have any other questions (please in English if possible)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

AlexFuster commented 5 years ago

Hi, LeonMiao2018,

I didn't know that paper, It is nice to see how Balle and Toderici joined forces. There is a point I don't understand and I would like to discuss it with you:

About the entropy estimation, if I understood it right, they are using a second network that estimates the parameters of a Gaussian distribution that models the entropy in a similar way that VAEs do. I use a similar approach but it instead of estimating the parameters, I directly estimate the entropy (in fact I don't estimate entropy, but the size of the latent representation in PNG). My question is, Why am I obtaining poorer results with my approximation? Is it for the instability of training two networks jointly? or is it maybe that using PNG encoding over the latent representation is a bad idea? would it be better arithmetic coding?

By the way, I use MS-SSIM for the distortion loss, instead of MSE or PSNR. For me it gives better results

give me your thoughts

best regards, Alex

hello-bin commented 5 years ago

Hi, AlexFuster, In paper, the author use the hyper prior and context information to estimate the distribution of code. In train stage, to have effective RD optimization, we must get the precise the rate estimate of code. So, In the paper they have a constraint of code distribute. I have make a test.During constraint of code distribute, for entropy encoder,we use the adptative arithmetic coding and the arithmetic coding based on code distribute and find that the arithmetic coding based on code distribute get a better result than adptative arithmetic coding. So I think, except the struction of network,the code table is also a way to influence the image compression performance. So, I have two question about your rate estimate. 1)whether the rate is precise? 2)When you test your model, how you ensure the bitstream file is close to your target rate. For "Why do they use 2 levels of compression? ",in my point, I think the reason as follows: we need for different image estimate different parameter. If we do not lead into some extra information, we can not ensure that different image get different the estimated parameter. We also make a test, when we use fixed the estimated parameter, the encoder in PSNR lower than JPEG2000 for 0.5dB. But after leading into the different estimated parameter, the encoder get the higher than JPEG2000 for 1.5dB. By the way,we use MSE as the distortion loss.

AlexFuster commented 5 years ago

Hi LeonMiao2018,

1) My entropy estimate is quite precise. At the end, It is simply a CNN that takes as input the encoder's output and predicts its entropy. For training this network I use the discrete entropy sum(P(i)*-log2(P(i))) as target. 2) In my model, bitrate can not be controlled at test time, because I train the network to maximize quality while minimizing entropy. I'm thinking in a way of modifying this adding the desired entropy as an input to the network and training it to produce latent representations with that entropy. Something similar to what conditional GANS do with targets. So the loss function would be to maximize quality while producing an entropy that is as close as possible to the specified one

As you have probably noticed, my background in compression is very limited. I just found it to be an interesting application of deep learning. So forgive me if I made stupid questions :)

I have 2 Questions: 1) So (in test time) do you think that using PNG to encode the encoder's output is a bad idea? It probably is because I don't see anyone to do it, but PNG uses something similar to LZW and is very good for compressing images. And the encoder's output is a set of channels that can be reshaped into an image. So why not to do it? Would it be better to use arithmetic coding? why?

2) Why does everybody use MSE? If I train with MSE and use the model for high resolution images, I obtain very blurry images. Or is it that you divide the image into blocks and compress it separately?