Question about gradient calculation in the quantizer

Justin-Tan / generative-compression

TensorFlow Implementation of Generative Adversarial Networks for Extreme Learned Image Compression

MIT License

511 stars 108 forks source link

Question about gradient calculation in the quantizer #27

Open Chao-Lu opened 5 years ago

Chao-Lu commented 5 years ago

I found that you have used "tf.stop_gradient()" to deal with the nondifferentiable properties of "tf.argmin()", "tf.round()". However, "tf.stop_gradient()" is used to ignore the gradient contirbution of present node, which means that your encoder network will not update its parameters since all the nodes before the quantizer (specifically the encoder network) will be ignored in the gradient calculation.
Are you tring to make the quantizer have a fixed gradient (such as "1") value at any time? If you are, I think you have to re-define the gradient of the quantizer rather than use "tf.stop_gradient()" .

Justin-Tan commented 5 years ago

I will look into the the first one, can you give more details about the second point?

sun107 commented 3 years ago

I wonder how to deal with the gradient when I apply the operations such as "tf.round". Will setting the gradient of these operation to be 1 help? Could you offer some references? Thank you very much!

Justin-Tan commented 3 years ago

IIRC, I manually overrode the gradient so that it is just the identity - this is the 'straight-through' estimator that works surprisingly well in practice.

On Tue, May 25, 2021 at 12:29 PM sun107 @.***> wrote:

I wonder how to deal with the gradient when I apply the operations such as "tf.round". Will setting the gradient of these operation to be 1 help? Could you offer some references? Thank you very much!

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/Justin-Tan/generative-compression/issues/27#issuecomment-847483397, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGRNY6GLUT5LV2GC6A74KQTTPMDSBANCNFSM4GSASWGA .

sun107 commented 3 years ago

Really thanks for your answer. I've got another problem. I wonder how does batchsize affect the result. Have you ever tried a bigger batch size? Thanks for your reply again!

Justin-Tan commented 3 years ago

The batch size is limited by GPU memory. In recent papers about learned image compression the batch size is usually set to something low like 8 or 16, so I imagine that other hyperparameters would be more important to tune.