flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.85k stars 2.09k forks source link

fix _all_scores_for_token in ViterbiDecoder #3455

Closed mauryaland closed 3 months ago

mauryaland commented 4 months ago

There was a bug where in a batch, it took the last tag sequence of that batch from the Viterbi decoder instead of the tag sequence corresponding to its sentence.

alanakbik commented 3 months ago

@mauryaland thanks for fixing this!

alanakbik commented 3 months ago

To reproduce the problem that is fixed:

tagger: SequenceTagger = SequenceTagger.load("ner-fast")

sentence_1 = Sentence("Mr John Smith arrived")
sentence_2 = Sentence("Hey ho ho ho")

tagger.predict([sentence_1, sentence_2], force_token_predictions=True, return_probabilities_for_all_classes=True)

print(sentence_1[1])
print(sentence_1[1].get_tags_proba_dist("ner"))

print()

print(sentence_1[2])
print(sentence_1[2].get_tags_proba_dist("ner"))

The tag probability distribution gives a different probability of the predicted tag than the prediction. That is because it is using the last tag sequence (sentence_2) rather than the actual sentence. This is fixed in the PR.