glample / tagger

Named Entity Recognition Tool
Apache License 2.0
1.16k stars 426 forks source link

Confidence score for the predicted entity #81

Closed ishita1995 closed 6 years ago

ishita1995 commented 6 years ago

I am not able to find the way to find the confidence score of the entity predicted by the model. Is there a way to calculate the confidence score?

glample commented 6 years ago

The LSTM-CRF model predicts a sequence of tags for the entire sequence. As a result, there is not a real notion of entity score, but only sequence score (and the model returns the sequence with the best score). However, you can do something like taking the average of the LSTM probability scores in your entity, and this should give you a good proxy for a confidence score. For instance, if you have "Barack Obama" in your sentence, and that the model tags these two words as "B_PER" and "E_PER" then you can report the average (or the product) of P(B_PER|Barack) and P(E_PER|Obama) given by the model.

ishita1995 commented 6 years ago

I understand what you are saying above, I am quite new to Theano. I am sorry if I am wrong While calling the forward function in the f_eval alpha variable would return the probability, but when I make return_best_sequence as False, the code breaks and gives the following error-

File "tagger.py", line 49, in classify_ner
    y_preds = np.array(f_eval(*input))[1:-1]
IndexError: too many indices for array
glample commented 6 years ago

What you want to look at is probably the tag probability scores: https://github.com/glample/tagger/blob/c735605fc2218975019aca04bd64796ac4f363bd/model.py#L278

ishita1995 commented 6 years ago

Thanks a lot , it worked ..