UKPLab / elmo-bilstm-cnn-crf

BiLSTM-CNN-CRF architecture for sequence tagging using ELMo representations.
Apache License 2.0
388 stars 81 forks source link

KeyError #17

Open Zawan-uts opened 5 years ago

Zawan-uts commented 5 years ago

Hi, iget the following error; Traceback (most recent call last): File "Train_NER.py", line 95, in model.setDataset(datasets, data) File "/home/hanah/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoBiLSTM.py", line 86, in setDataset self.idx2Labels[modelName] = {v: k for k, v in self.mappings[labelKey].items()} KeyError: 'NER_BIO'

I set the parameters like this;

{'columns': {0:'tokens', 4:'NER_BIO'} and my data looks like this

-

EP1764360_0095 5566 5567 I-M I-M pyridin-2-yl)-carbamic EP1764360_0095 5567 5589 I-M I-M acid EP1764360_0095 5590 5594 I-M I-M tert EP1764360_0095 5595 5599 I-M I-M

nreimers commented 5 years ago

Could you paste your complete definition of the datasets variable?

Zawan-uts commented 5 years ago

0: tokens, 1: document ID, 2: start offset, 3: End offset, 4: BIO encoding, 5: IOBES encoding

nreimers commented 5 years ago

I mean the datasets variable in your python script.

See https://github.com/UKPLab/elmo-bilstm-cnn-crf/blob/master/Train_Chunking.py line 44

Zawan-uts commented 5 years ago

datasets = { 'Biosemantics':
{'columns': {0:'tokens', 4:'NER_BIO'},
'label': 'NER_BIO',
'evaluate': True,
'commentSymbol': None}

nreimers commented 5 years ago

Could it be that the CoNLL file is not split correctly? Check that when the CoNLL file is read, that each token as the columns 'tokens' and 'NER_BIO'.

Maybe individual lines are malformatted, which causes that not enough columns are extracted which then causes the keyerror.

Zawan-uts commented 5 years ago

It works fine when I select column 5 for IOBES encoding, but throws error with column 4 for BIO encoding. If the lines were malformatted then it would have not worked with column 5.

nreimers commented 5 years ago

Maybe you can send me you train python script and the dataset? The I can have a look. Rnils@web.de

Zawan-uts commented 5 years ago

Hi @nreimers I have tried sending email multiple times at the given email, it doesn't work and message is undeliverable.

nreimers commented 5 years ago

Then try: reimers@ukp.informatik.tu-darmstadt.de

Please try to not send too large files, they might get rejected from the mail server. If your dataset is too large, upload it somewhere and send me the link

Zawan-uts commented 5 years ago

sent