allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

drastic impact of Changing the vocabulary on perplexity #233

Open Krishnkant-Swarnkar opened 4 years ago

Krishnkant-Swarnkar commented 4 years ago

I was trying to train the ELMo on an augmented version of the 1 Billion Benchmark corpus. The augmented sentences bring in some extra proper nouns to the corpus. So, I added these extra proper nouns (a few thousand) to the default vocab. I noticed that the training perplexity went to near 4 (just in one epoch of training). I noticed that the code uses a sampled softmax, so I increased the "n_negative_samples_batch" by 5x. Still the perplexity remains nearly the same (after 1 epoch). Isn't that weird? Any explainations?

matt-peters commented 4 years ago

Yes that is weird. Possible explanations are: