andreasveit / convnet-aig

PyTorch implementation for Convolutional Networks with Adaptive Inference Graphs
BSD 3-Clause "New" or "Revised" License
185 stars 28 forks source link

Is the Gumbel-Softmax formulation accurate? #10

Open atiorh opened 4 years ago

atiorh commented 4 years ago

Thanks for releasing the code!

I have been reviewing how the Gumbel-Softmax[1] trick was used and both the paper and the code suggest that the "relevance scores are interpreted as log probabilities"[2] but how come the output of a convolutional layer is interpreted as being a strictly negative quantity? (This is unlikely to break training but silently yield suboptimal performance due to inaccurate approximate sampling from the discrete distribution)

Please let me know, maybe there is a subtle intuition or training dynamic at play here that I am missing. Thanks!

[1] https://arxiv.org/pdf/1611.01144.pdf (Equation 1) [2] https://arxiv.org/pdf/1711.11503.pdf (Section 3.3, page 5)