Open Zawan-uts opened 5 years ago
Could you paste your complete definition of the datasets variable?
0: tokens, 1: document ID, 2: start offset, 3: End offset, 4: BIO encoding, 5: IOBES encoding
I mean the datasets variable in your python script.
See https://github.com/UKPLab/elmo-bilstm-cnn-crf/blob/master/Train_Chunking.py line 44
datasets = {
'Biosemantics':
{'columns': {0:'tokens', 4:'NER_BIO'},
'label': 'NER_BIO',
'evaluate': True,
'commentSymbol': None}
Could it be that the CoNLL file is not split correctly? Check that when the CoNLL file is read, that each token as the columns 'tokens' and 'NER_BIO'.
Maybe individual lines are malformatted, which causes that not enough columns are extracted which then causes the keyerror.
It works fine when I select column 5 for IOBES encoding, but throws error with column 4 for BIO encoding. If the lines were malformatted then it would have not worked with column 5.
Maybe you can send me you train python script and the dataset? The I can have a look. Rnils@web.de
Hi @nreimers I have tried sending email multiple times at the given email, it doesn't work and message is undeliverable.
Then try: reimers@ukp.informatik.tu-darmstadt.de
Please try to not send too large files, they might get rejected from the mail server. If your dataset is too large, upload it somewhere and send me the link
sent
Hi, iget the following error; Traceback (most recent call last): File "Train_NER.py", line 95, in
model.setDataset(datasets, data)
File "/home/hanah/Documents/Gitstuff/elmo-bilstm-cnn-crf-v02/neuralnets/ELMoBiLSTM.py", line 86, in setDataset
self.idx2Labels[modelName] = {v: k for k, v in self.mappings[labelKey].items()}
KeyError: 'NER_BIO'
I set the parameters like this;
{'columns': {0:'tokens', 4:'NER_BIO'} and my data looks like this
EP1764360_0095 5566 5567 I-M I-M pyridin-2-yl)-carbamic EP1764360_0095 5567 5589 I-M I-M acid EP1764360_0095 5590 5594 I-M I-M tert EP1764360_0095 5595 5599 I-M I-M