drastic impact of Changing the vocabulary on perplexity

allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models

Apache License 2.0

1.62k stars 452 forks source link

I was trying to train the ELMo on an augmented version of the 1 Billion Benchmark corpus. The augmented sentences bring in some extra proper nouns to the corpus. So, I added these extra proper nouns (a few thousand) to the default vocab. I noticed that the training perplexity went to near 4 (just in one epoch of training). I noticed that the code uses a sampled softmax, so I increased the "n_negative_samples_batch" by 5x. Still the perplexity remains nearly the same (after 1 epoch). Isn't that weird? Any explainations?

allenai / bilm-tf

drastic impact of Changing the vocabulary on perplexity #233