Hi, I've done some testing for 20 epochs. While awd-lstm Weight_Drop results scale as expected, there is a huge difference between dropout=0 and 0.1 for torchnlp Weight_Drop. I'm really curious why this is happening as the model become too hard to train even with 0.1 dropout.
Hi, I've done some testing for 20 epochs. While awd-lstm Weight_Drop results scale as expected, there is a huge difference between dropout=0 and 0.1 for torchnlp Weight_Drop. I'm really curious why this is happening as the model become too hard to train even with 0.1 dropout.
after 20 epochs:
wdrop = 0
loss: 53.8 - accuracy: 72.0
wdrop = 0.1 awd-lstm:
loss: 58.8 - accuracy: 67.0
torch.nlp:loss: 68.1 - accuracy: 57.2
wdrop = 0.9 awd-lstm:
loss: 66.0 - accuracy: 59.0
torch.nlp:loss: 69.9 - accuracy: 55.92