ELMo cache confusion - Githubissues

allenai / allennlp

An open-source NLP research library, built on PyTorch.

http://www.allennlp.org

Apache License 2.0

11.77k stars 2.25k forks source link

ELMo cache confusion #1825

Closed flyaway1217 closed 6 years ago

flyaway1217 commented 6 years ago

Hi, there I am following the tutorial to use the elmo embeddings. The problem is it seems that we have to recompute the embeddings (go through the lstm) for each sentence which is very slow. If I understand correctly, all we need is to update the weight (gamma and s in the equation 1 in the paper.) for embedding of each layers. We don't need to go through the lstms every time because we do not update the lstms.

In the elmo.py I find every time I call embeddings = elmo(character_ids). It will go through all the networks again. I think this is unnecessary. Is there any cache can be done?

DeNeutoy commented 6 years ago

The problem is that elmo embeddings depend on context, so if the context changes for a word, the sentence also changes. You are right though that you could cache embeddings for the full sentence - we provide that functionality too,via hdf5 files: https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#writing-contextual-representations-to-disk

flyaway1217 commented 5 years ago

Hi, @DeNeutoy Previously, I do not need to train with a new loss function and what described in https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#writing-contextual-representations-to-disk is good for me. However, now, I have new loss functions to tune. What I want to do is to only update these scalar weights. The same problem happens. The default setting of Elmo class can update the scalar weights but it also has to go through the lsmts which is very slow. I am wondering is there any method that can cache all the sentences (training sentences is fixed and should be cached) and only update the scalar weights without going through the LSTMs?