ChengyueGongR / advsoft

Language Model Baselines for PyTorch
42 stars 4 forks source link

Language Modeling in PyTorch

This repository contains the code used for the paper:

This code was originally forked from the awd-lstm-lm and MoS-awd-lstm-lm.

Except the method in our paper, we also implement a recent proposed regularization called PartialShuffle. We find that combining this techique with our method can further improve the performance for langauge models.

The model comes with instructions to train: word level language models over the Penn Treebank (PTB), WikiText-2 (WT2), and WikiText-103 (WT103) datasets. (The code and pre-trained model for WikiText-103 will be merged into the branch soon.)

If you use this code or our results in your research, you can choose to cite:

@InProceedings{pmlr-v97-wang19f,
  title =    {Improving Neural Language Modeling via Adversarial Training},
  author =   {Wang, Dilin and Gong, Chengyue and Liu, Qiang},
  booktitle =    {Proceedings of the 36th International Conference on Machine Learning},
  pages =    {6555--6565},
  year =     {2019},
  editor =   {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume =   {97},
  series =   {Proceedings of Machine Learning Research},
  address =      {Long Beach, California, USA},
  month =    {09--15 Jun},
  publisher =    {PMLR},
}

Warning

Although the repo is implemented in pytorch 0.4, we have found that the post process can only work well with pytorch 0.2. Therefore, we add a patch for dynamic evaluation and it should be run under pytorch 0.2. We are now trying to fix this problem. If you have any idea, feel free to talk with us.

MoS-AWD-LSTM + Adv + PartialShuffle

Open the folder mos-awd-lstm-lm and you can use the MoS-awd-lstm-lm, which can achieve good performance but also cost a lot of time.

PTB with MoS-AWD-LSTM

We first list the results without dynamic evaluation:

Method Valid PPL Test PPL
MoS 56.54 54.44
MoS + PartialShuffle 55.89 53.92
MoS + Adv 55.08 52.97
MoS + Adv + PartialShuffle 54.10 52.20

If you want to use Adv only, run the following command:

To use PartialShuffle, add a command --partial, we try to use PartialShuffle only in the last finetune and get 54.92 / 52.78 (validation / testing). You can download the pretrained-model along with the log file or train it from scratch.

WT2 with MoS-AWD-LSTM

If you want to use Adv only, Run the following command:

To use PartialShuffle, add a command --partial.

AWD-LSTM-LM + Adv (building...)

Open the folder awd-lstm-lm and you can use the awd-lstm-lm, which can achieve good performance and cost less time.

PTB with AWD-LSTM

Run the following command:

You can download the [pretrained-model]() along with the log file or train it from scratch.

WT2 with AWD-LSTM

Run the following command:

You can download the pretrained-model along with the log file or train it from scratch.