UKPLab / elmo-bilstm-cnn-crf

BiLSTM-CNN-CRF architecture for sequence tagging using ELMo representations.
Apache License 2.0
388 stars 81 forks source link

:: Lookup embeddings and tokens (this might take a while) :: Killed #12

Open ghost opened 5 years ago

ghost commented 5 years ago

When I ran Train_chunking.py I received Killed message. Do you have any idea why?

nreimers commented 5 years ago

Do you maybe run out of memory?

This implementation is sadly quite memory hungry and needs easily 4-8 GB RAM.

ghost commented 5 years ago

I have 8 GB RAM in my PC, i am not sure why has that happened. Can i use this model for named biomedical entity recognition(such as LINNAEUS corpus for NER of species) as is? or will it need some adaptations? Sorry I am not very expert at programming, excuse me if that sounds silly.

nreimers commented 5 years ago

8 GB of RAM is quite low for this implementation, especially if your data set is rather large.

I can recommend the non ELMo version here: https://github.com/UKPLab/emnlp2017-bilstm-cnn-crf/

Or to use the system from AllenNLP (however, which is substantially slower): https://github.com/allenai/allennlp

ghost commented 5 years ago

Hi,

I have just tested the code with another computer where I have 16 GB RAM, now the code returned this error :: Lookup embeddings and tokens (this might take a while) ::

:: Lookup embeddings and tokens (this might take a while) :: Traceback (most recent call last): File "Train_Chunking.py", line 57, in pickleFile = perpareDataset(datasets, embLookup) File "/home/icean/Documents/Gitstuff/elmo-bilstm-cnn-crf/util/preprocessing.py", line 44, in perpareDataset pkl.dump(pklObjects, f, -1) MemoryError

nreimers commented 5 years ago

Hi @BuggyLife I just push a new version to the repository.

This version computes the ELMo embeddings on-the-fly. This reduces significantly the memory footprint to about 2 GB.

Caching of ELMo embeddings (for faster training & evaluation time) is now optional and can be enabled.

Please try the new version and give me a feedback if it now works for you.

ghost commented 5 years ago

I was able to run the code with caching for elmo embeddings enabled.

Thanks for being around on the weekend.

ghost commented 5 years ago

Hi,

I have just tested the code with another dataset and got the following error;

raining: 426 Batch [37:40, 188.28s/ Batch]Traceback (most recent call last): File "Train_NER.py", line 97, in model.fit(epochs=25) File "/home/sahbi/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoBiLSTM.py", line 420, in fit self.trainModel() File "/home/sahbi/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoBiLSTM.py", line 304, in trainModel for batch in tqdm(self.minibatch_iterate_dataset(), total=self.trainRangeLength, desc="Training", unit=' Batch'): File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 999, in iter for obj in iterable: File "/home/sahbi/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoBiLSTM.py", line 390, in minibatch_iterate_dataset batches[modelName].extend(self.getInputData(trainMatrix, range(dataRange[0], dataRange[1]))) File "/home/sahbi/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoBiLSTM.py", line 515, in getInputData inputData = np.asarray(self.embeddingsLookup.batchLookup(batch_sentences, pureFeatureName)) File "/home/sahbi/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoWordEmbeddings.py", line 99, in batchLookup return np.asarray(self.getElmoEmbedding(sentences)) File "/home/sahbi/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoWordEmbeddings.py", line 142, in getElmoEmbedding for elmo_vectors in self.elmo.embed_sentences(non_cached_sentences): File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/elmo.py", line 250, in embed_sentences yield from self.embed_batch(batch) File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/elmo.py", line 218, in embed_batch embeddings, mask = self.batch_to_embeddings(batch) File "/usr/local/lib/python3.6/dist-packages/allennlp/commands/elmo.py", line 162, in batch_to_embeddings bilm_output = self.elmo_bilm(character_ids) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, kwargs) File "/usr/local/lib/python3.6/dist-packages/allennlp/modules/elmo.py", line 579, in forward token_embedding = self._token_embedder(inputs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(*input, *kwargs) File "/usr/local/lib/python3.6/dist-packages/allennlp/modules/elmo.py", line 348, in forward convolved = conv(character_embedding) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 491, in call result = self.forward(input, kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 176, in forward self.padding, self.dilation, self.groups) RuntimeError: $ Torch: not enough memory: you tried to allocate 0GB. Buy new RAM! at /pytorch/aten/src/TH/THGeneral.c:218.

I am trynna use word2vec pretrained embeddings with BC4CHEMDNER dataset.

nreimers commented 5 years ago

Appears that you don't have enough memory.

Did you try to turn of the caching for the ELMo embeddings?

Did you use the original word2vec embeddings from Mikolov. These are rather large, as they also include bigrams. These bigram embeddings cannot be used by this architecture. You could try to use embeddings with smaller vocab sizes, for example, some of the GloVe embeddings.

ghost commented 5 years ago

Hi @nreimers

After disabling the caching ELMO it is working. I am currently using this model for NER of biomedical entities, I simply changed the datasets and embeddings variable in Train_chunking.py and did some experiments. Just wanted to confirm that it it's the right way to do NER with this code? Another issue is that, i see that my non-trainable parameters are "0". I think there is something wrong with my experiments or what ? I am writing a paper where I am going to link this github repo as I have used it for the experiments.

nreimers commented 5 years ago

Sounds good, don't see any issue with what you described about your experiment.

Non-trainable parameters = 0 is what is expected for this architecture. The word embeddings are kept outside the keras model and are not updated during the training. The weights (parameters) for the LSTMs and CRF are all updated during the training.

ghost commented 5 years ago

Do you mean that the weights of ELMo don't get updated during the training?

nreimers commented 5 years ago

ELMo weights in general do not get updated. But I was referring to the traditional word embeddings, like word2vec, which are also not updated in this setup.

Only part from ELMo that gets updated are the scalars for the weighted average, if you chose 'weighted average' as ELMo mode.