Closed ishita1995 closed 6 years ago
The LSTM-CRF model predicts a sequence of tags for the entire sequence. As a result, there is not a real notion of entity score, but only sequence score (and the model returns the sequence with the best score). However, you can do something like taking the average of the LSTM probability scores in your entity, and this should give you a good proxy for a confidence score. For instance, if you have "Barack Obama" in your sentence, and that the model tags these two words as "B_PER" and "E_PER" then you can report the average (or the product) of P(B_PER|Barack) and P(E_PER|Obama) given by the model.
I understand what you are saying above, I am quite new to Theano. I am sorry if I am wrong While calling the forward function in the f_eval alpha variable would return the probability, but when I make return_best_sequence as False, the code breaks and gives the following error-
File "tagger.py", line 49, in classify_ner
y_preds = np.array(f_eval(*input))[1:-1]
IndexError: too many indices for array
What you want to look at is probably the tag probability scores: https://github.com/glample/tagger/blob/c735605fc2218975019aca04bd64796ac4f363bd/model.py#L278
Thanks a lot , it worked ..
I am not able to find the way to find the confidence score of the entity predicted by the model. Is there a way to calculate the confidence score?