This is the code we used in our NIPS 2018 paper
Frequency-Agnostic Word Representation (Improving Word Embedding by Adversarial Training)
Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-yan Liu
The hyper-parameters are set for pytorch 0.4
version. The result of our paper is based on pytorch 0.2
, so some results are better (awd-lstm) while some results are worse (awd-lstm-mos).
The performance will change when changing GPU.
Therefore, the guide below can produce results similar to the numbers reported, but maybe not exact. If you have some difficulties at reproducing the final results, feel free to ask the first author for help (e-mail: cygong@pku.edu.cn)
You can download the pretrained model and the code here: pretrained_model.
The PPL after finetune is 57.7
/55.8
(valid / test). The PPL after post-process is 52.1
/51.6
(valid / test).
Run the following commands:
python3 main.py --nonmono 5 --batch_size 20 --data data/penn --dropouth 0.25 --dropouti 0.3 --dropout 0.4 --alpha 2 --beta 1 --seed 141 --epoch 2000 --save ./trained_model/ptb.pt
python3 pointer.py --data data/penn --save PTB.pt --lambdasm 0.08 --theta 1.4 --window 500 --bptt 5000
Run the following commands:
python3 -u main.py --epochs 4000 --nonmono 5 --data data/wikitext-2 --dropouti 0.5 --dropouth 0.2 --seed 1882 --save ./trained_model/wiki2.pt --moment_split 8000 --moment_lambda 0.02
or python3 main.py --epochs 4000 --nonmono 5 --data data/wikitext-2 --save WT2.pt --dropouth 0.2 --dropouti 0.5 --seed 1882 --adv_split 8000 --adv_lambda 0.02
python3 pointer.py --save WT2.pt --lambdasm 0.16 --theta 1.4 --window 4200 --bptt 2000 --data data/wikitext-2
Note: For pointer.py
, you may do a grid search for each trained model since it is very sentisitive to hyper-parameters.
Warning The dynamic evaluation contains some bugs for pytorch 0.4
(if you use original MoS with pytorch 0.4
, you will also meet with this problem). Now, we suggest you to add some patchs and run it in early version, e.g. pytorch 0.2
. Please check issue to know how to fix it.
For the pytroch 0.4.0
code, detailed information can be found in https://github.com/ChengyueGongR/Frequency-Agnostic/issues/2.
We can now achieve 56.00/53.82 after finetuning (it's 55.51/53.31 in our paper).
You can download the pretrained model and the code here: pretrained_model. The path for the final model is ./pretrained_ptb/finetune_model.pt
. (pytorch 0.4
)
You can download the pretrained model for pytorch 0.2
here: pretrained_model
Run the following commands:
python3 -u main.py --data data/penn --dropoutl 0.29 --dropouth 0.225 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 20.0 --epoch 600 --nhid 960 --nhidlast 620 --emsize 280 --n_experts 15 --save PTB --single_gpu --moment --adv --switch 160
python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu
cp PATH_TO_FOLDER/finetune_model.pt PATH_TO_FOLDER/model.pt
and run python3 -u finetune.py --data data/penn --dropoutl 0.29 --dropouti 0.25 --gaussian 0.15 --dropouth 0.225 --seed 28 --batch_size 12 --lr 25.0 --epoch 1000 --nhid 960 --emsize 280 --n_experts 15 --save PATH_TO_FOLDER --single_gpu
(twice)A large portion of this repo is borrowed from the following repos: https://github.com/salesforce/awd-lstm-lm, https://github.com/zihangdai/mos, https://github.com/pytorch/fairseq and https://github.com/tensorflow/tensor2tensor.