question about gumbel softmax

XiaoyuShi97 commented 3 years ago

Hi, nice work! I am a bit confused about gumbel softmax. You mention in your paper that, during traininig, gumbel softmax is used. I wonder if it can be replaced by pure softmax (i.e. torch.softmax)? Could you please give more explanation on this design choice? Thx!

LongguangWang commented 3 years ago

Hi @btwbtm, thanks for your interest in our work. Softmax is also used in several network quantization or pruning methods to soften one-hot distributions. In my opinion, softmax may also works in our SMSR but I have not tried it. In our experiments, gumbel softmax is adopted since it is theorically identical to one-hot distribution while softmax is not.

wangqiim commented 2 years ago

Hi @btwbtm, thanks for your interest in our work. Softmax is also used in several network quantization or pruning methods to soften one-hot distributions. In my opinion, softmax may also works in our SMSR but I have not tried it. In our experiments, gumbel softmax is adopted since it is theorically identical to one-hot distribution while softmax is not.

https://github.com/The-Learning-And-Vision-Atelier-LAVA/SMSR/blob/daac49c9a107778c95e11a16fd5b4a8b45513678/model/smsr.py#L12-L21

I found the implement of gumbel softmax in your code is different from original paper("Categorical reparameterization with gumbel-softmax"), why do you modify this? which is better?

The-Learning-And-Vision-Atelier-LAVA / SMSR

question about gumbel softmax #9