allenai / bilm-tf

Tensorflow implementation of contextualized word representations from bi-directional language models
Apache License 2.0
1.62k stars 452 forks source link

Language model state change leads to changes in ELMo embedding? #176

Closed haoyangz closed 5 years ago

haoyangz commented 5 years ago

It seems that running this twice will give rise to different embeddings for the exact same input. I assume it has to do with the fact that the state of the LM is changed after the first pass?

If so, it would imply that the ordering of the batches in my new text corpus changes the ELMo embedding I get for them. Is there a way to avoid this?

haoyangz commented 5 years ago

My working solution is to first define a init_op = tf.global_variables_initializer() operation, and run sess.run(init_op) everytime you pass a new batch through the language model.

Note that directly running sess.run(tf.global_variables_initializer()) repeatedly will lead to memory leakeage.

matt-peters commented 5 years ago

This will most likely cause a substantial decrease in performance and I'd recommend not doing this. To convince yourself of this, compute the perplexity of a dataset with the pretrained model when you re-initialize the states after each batch vs when you do not run the initializer after each batch. The internal states stabilize for all practical purposes after the first few batches, and after the initial few batches the statefulness is a non-issue. To convince yourself of that, run the same batch through the model multiple times and track the batch-to-batch difference in the ELMo representations.