I was trying to train the ELMo on an augmented version of the 1 Billion Benchmark corpus. The augmented sentences bring in some extra proper nouns to the corpus. So, I added these extra proper nouns (a few thousand) to the default vocab.
I noticed that the training perplexity went to near 4 (just in one epoch of training).
I noticed that the code uses a sampled softmax, so I increased the "n_negative_samples_batch" by 5x. Still the perplexity remains nearly the same (after 1 epoch).
Isn't that weird? Any explainations?
your augmented 1 Billion Benchmark is much easier for a language model to learn then the original 1 Billion Benchmark (and therefore perplexity really is much lower)
I was trying to train the ELMo on an augmented version of the 1 Billion Benchmark corpus. The augmented sentences bring in some extra proper nouns to the corpus. So, I added these extra proper nouns (a few thousand) to the default vocab. I noticed that the training perplexity went to near 4 (just in one epoch of training). I noticed that the code uses a sampled softmax, so I increased the "n_negative_samples_batch" by 5x. Still the perplexity remains nearly the same (after 1 epoch). Isn't that weird? Any explainations?