danielkelshaw / ConcreteDropout

PyTorch implementation of 'Concrete Dropout'
https://arxiv.org/abs/1705.07832
MIT License
14 stars 2 forks source link

is softmax required? #11

Closed as595 closed 4 years ago

as595 commented 4 years ago

The model passes the logits through softmax in the output layer, but then the F.cross_entropy() function is used which combines log_softmax and nll. I think maybe the softmax in the output layer is not necessary?

danielkelshaw commented 4 years ago

Hi, it's been a little while since I looked at this code - but I'll take a look. Thanks for bringing it up!

danielkelshaw commented 4 years ago

Thanks @as595 for pointing this out, I'll fix it later today - or you're more than welcome to submit a PR.

Brief Documentation of the Issue:

It appears the softmax in the output layer is unnecessary:

https://github.com/danielkelshaw/ConcreteDropout/blob/7c0fdd79d13d88e61ee68b4d15269cfdf5d64043/mnist_example.py#L42

In the train() function, this is effectively getting used twice:

https://github.com/danielkelshaw/ConcreteDropout/blob/7c0fdd79d13d88e61ee68b4d15269cfdf5d64043/mnist_example.py#L57-L59

Code for the test() function will need to change slightly:

https://github.com/danielkelshaw/ConcreteDropout/blob/7c0fdd79d13d88e61ee68b4d15269cfdf5d64043/mnist_example.py#L78-L83

Predictions are based on the logits passed through softmax so the code will need to be amended to ensure the test_loss is calculated correctly.