Why use the arithmetic coding compress the y_hat by pixel

ZhengxueCheng / Learned-Image-Compression-with-GMM-and-Attention

Repository of the paper "Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules"

181 stars 31 forks source link

Why use the arithmetic coding compress the y_hat by pixel #3

Closed rongpan123 closed 4 years ago

rongpan123 commented 4 years ago

In the network.py, the y_hat is compressed by encoder in pixels. If I want to compress the y_hat at once ,it will impact the pant or msssim? what do you think about it?

ZhengxueCheng commented 4 years ago

In the network.py, the y_hat is compressed by encoder in pixels. If I want to compress the y_hat at once ,it will impact the pant or msssim? what do you think about it?

Hi, thanks for your question. The psnr and ms-ssim should be kept the same since arithmetic coding is lossless. Of course you can do it in parallel, but you still need to update the y_hat pixel by pixel because the mask convolution needs sequential decoding.

rongpan123 commented 4 years ago

When I train the model , I also need to update the y_hat pixel by pixel, which cost very long time. Could you please give some details about the parallel operation?

ZhengxueCheng commented 4 years ago

When I train the model , I also need to update the y_hat pixel by pixel, which cost very long time. Could you please give some details about the parallel operation?

Hi, During the training, mask convolution is used to avoid updating y_hat pixel by pixel. The mask assigns the elements before y in the raster-scan order as one and assign the elements after y as zero, which guarantees current y_hat is only related to previously encoded elements. For details, you can refer to the PixelCNN papers, such as https://arxiv.org/pdf/1606.05328.pdf.