Is the Gumbel-Softmax formulation accurate?

Thanks for releasing the code!

I have been reviewing how the Gumbel-Softmax[1] trick was used and both the paper and the code suggest that the "relevance scores are interpreted as log probabilities"[2] but how come the output of a convolutional layer is interpreted as being a strictly negative quantity? (This is unlikely to break training but silently yield suboptimal performance due to inaccurate approximate sampling from the discrete distribution)

Please let me know, maybe there is a subtle intuition or training dynamic at play here that I am missing. Thanks!

[1] https://arxiv.org/pdf/1611.01144.pdf (Equation 1) [2] https://arxiv.org/pdf/1711.11503.pdf (Section 3.3, page 5)

andreasveit / convnet-aig

Is the Gumbel-Softmax formulation accurate? #10