glample / tagger

Named Entity Recognition Tool
Apache License 2.0
1.16k stars 426 forks source link

KeyError: u'S-PRT' #52

Closed minhson-kaist closed 6 years ago

minhson-kaist commented 7 years ago

Dear Mr. Lample. I tried to put my data into your code but something happened, so I put the conll2002 data to try. But during the training, an error occurred. The terminal has an error line: KeyError: u'S-PRT'. I tried to print all the str_words and I cannot find this key. I think that this key comes from the model. How can I fix it? Thank you!

glample commented 7 years ago

Hi,

This is indeed not in the words, this is supposed to be a tag. I don't see where this can happen though. Can you provide the complete traceback error?

minhson-kaist commented 7 years ago

Hi,

I can finally understand the problem and solve it. This is because of the lack of PRT label in the dev or test set. I just copy a small part of training and testing set in the conll2002 data so this problem happened.

Thank you for your responding.

Regards.

IsabelMeraner commented 5 years ago

Is there a workaround if a tag does not occur in both training and test/dev set?
I am using k-fold crossvalidation to split my small dataset and there is 1 fold where all occurrences of a certain low-frequent tag occur only in the test set and not in the training set.

The error I get looks like this:

_Found 8 unique named entity tags
Traceback (most recent call last):
File "train.py", line 191, in
test_sentences, word_to_id, char_to_id, tag_to_id, lower
File "./../tagger/loader.py", line 146, in prepare_dataset
tags = [tag_to_id[w[-1]] for w in s]
KeyError: u'B-lat_fam'_

Do you have any suggestions @glample ?
Thank you so much.