ahmetumutdurmus / awd-lstm

Replication of "Regularizing and Optimizing LSTM Language Models" by Merity et al. (2017).
https://arxiv.org/abs/1708.02182
12 stars 1 forks source link

Perplexity results and hyperparameters #1

Open AndreaLK3 opened 4 years ago

AndreaLK3 commented 4 years ago

Hello, I would be interested in knowing whether you managed to replicate the performance of AWD-LSTM on WikiText-2 (Validation PPL=60.0 , Test PPL=57.3 Validation PPL=68.6 , Test PPL=65.8).

I saw this port only now. I made the AWD-LSTM compatible with PyTorch 1.5.0 using both my modifications and the port for PyTorch 1.2.0 at https://github.com/mourga/awd-lstm-lm

I used the default hyperparameters of the original model at https://github.com/salesforce/awd-lstm-lm, and I got slightly worse results: Validation PPL=78.49 and Test PPL=74.98 Should I just try the hyperparameters you have specified here?

ahmetumutdurmus commented 4 years ago

Hi, I was able to replicate the results presented in the paper to a fairly reasonable degree (± 1.0 PPL due to random initiation). Yet the "Validation PPL=60.0 , Test PPL=57.3" figures you gave seem to belong to PTB dataset. The WikiText-2 results should be: Validation PPL = 68.6, Test PPL = 65.8. Perhaps you should try the hyperparameters I have specified here as you suggested? If that doesn't work we'll take another look.

AndreaLK3 commented 4 years ago

Using the hyperparameters in my version, I get valid ppl=78.8 and test ppl=75.5. It may be due to my version of the WeightDrop not working as well as the original, or to PyTorch 1.5.0. I will clone the repo from here, adjust whatever is necessary, and see the results I get

AndreaLK3 commented 4 years ago

I git-cloned the repo and executed the command specified here. My setup is PyTorch 1.5.0, with CUDA 10.1. The best validation PPL was only 80.6, reached early at Epoch 100. Test set perplexity of best model: 77.2.

Moreover, I got a warning at every forward() call that may or may not be relevant:

UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. 
Its .grad attribute won't be populated during autograd.backward(). 
If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. 
If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. 
See github.com/pytorch/pytorch/pull/30531 for more informations.
warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "

At the location:

File "main.py", line 140, in train
    norm = nn.utils.clip_grad_norm_(model.parameters(), args.max_grad_norm)
  File "/home/andrealk3/venvs/torch15/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 24, in clip_grad_norm_
    parameters = list(filter(lambda p: p.grad is not None, parameters))
  File "/home/andrealk3/venvs/torch15/lib/python3.6/site-packages/torch/nn/utils/clip_grad.py", line 24, in <lambda>
    parameters = list(filter(lambda p: p.grad is not None, parameters))

Since you managed to replicate the original results of 68.6/65.8, I guess the difference must be due to the changes of newer versions of PyTorch (or maybe fixing the warning would solve it? I.d.k.)