ctmakro / npeg

Neural Network Image Compression
13 stars 5 forks source link

inputs on using Perceptual loss #1

Open ShashiAI opened 6 years ago

ShashiAI commented 6 years ago

Hey there !

Nice work on using perceptual loss for image compression using NN.

I had something similar in mind and thought of trying. I came across your code and found it interesting that you have mentioned the comment "Not very successful" for using the Perceptual loss !

Could you please elaborate on why it was not successful ?

Your inputs are much appreciated !

Regards, Shashi.

ctmakro commented 6 years ago

current SOTA to increase image quality over traditional MSE loss is Adversarial Loss (via GAN), not Perceptual Loss (middle layer feature distance of pre-trained VGG/ResNet, as in Neural Style Transfer).

Those SOTA papers on Image Super-resolution and Compression all include some form of Adversarial Loss now.

I think the perceptual loss will only be useful if you want to use an 'underpowered decoder' (to save computation cost maybe?) which does not have the power to imagine all the visible details but only a subset of them. In such cases, it does make sense to weight the details by their contribution to human perception.

ShashiAI commented 6 years ago

Thanks for the information !

Regarding this - "In such cases, it does make sense to weight the details by their contribution to human perception". Even I feel the same and want to do more experiments on that !

By the way, did you do a subjective comparison between the networks learnt with MSE loss and Perceptual loss ? Was there a visible improvement ?

I ask this because in this paper:"perceptual losses for real-time style transfer and super-resolution", they have used perceptual loss for super-resolution and their results look pretty amazing !

Your thoughts on this will be much appreciated !

Best Regards, Shashi

ctmakro commented 6 years ago

MSE loss does not care about the properties of human visual perception.

Human cares more about the shape and edges of an object (just like in ILSVRC - hence the use of pre-trained models for perceptual loss) while MSE cares more about the closeness of RGB values at each pixel. Therefore when the compression ratio is high, the model employing MSE tends to discard edges but preserve color, which is obviously not good for perceptual quality, resulting in blurry images.

I did make some comparison (MSE vs Perceptual), but not with this project. But I'm pretty sure that in general, MSE -> blurry images if you don't have enough information for reconstruction. The same blurry effect has been observed in various image-image translation papers, to show the necessity of adversarial losses.

It's like having a pixel that should either be white or black to make the image look real, but by applying MSE you made the network outputting grey. Also, MSE does not care about inter-pixel relationships.

that's where perceptual loss come in handy - the loss is more sensitive to shape and edges.

Edit: in my last response I said: "I think the perceptual loss will only be useful if you want to use an 'underpowered decoder'".

That was incorrect. When the objective is to compress (in comparison to super-resolution), a perceptual loss can help the network to preserve (to describe with more bits) the details that are important to human perception. Which means when a perceptual loss is applied, more bits will be used to describe edges than color.

I forgot why I was unsuccessful at applying the perceptual loss. Will let you know once I found out...

ctmakro commented 6 years ago

@ShashiAI https://github.com/google/butteraugli you might wanna take a look at this.

ShashiAI commented 6 years ago

I had seen this :) As it turn out, I was planning to use this as a metric to see how good the network learns ! I will let you know how it goes !