chokkan / crfsuite

CRFsuite: a fast implementation of Conditional Random Fields (CRFs)
http://www.chokkan.org/software/crfsuite/
Other
641 stars 208 forks source link

Question: Is "n-best" tagging possible with CRFSuite? #84

Open wrznr opened 7 years ago

wrznr commented 7 years ago

The Wapiti CRF toolkit has a neat feature called N-best Viterbi output which returns the n-best label sequences for an input sequence. Is there a similar functionality in crfsuite?

Thanks for your hints!

usptact commented 7 years ago

CRFSuite does not support n-best output. The decoder algorithm is Viterbi which appears to not too difficult to make it n-best (especially for short sequences).

Did you manage to get meaningful n-best outputs with Wapiti on your data? I looked at it a while ago and realized that on my data n-best outputs not always make sense (NER).

ZmeiGorynych commented 6 years ago

How about looking at marginal probabilities for all possible labels in a given position (that functionality exists in the Python wrapper as pycrfsuite.Tagger.marginal() so I presume also in the CFRSuite itself) and picking the best n values?

usptact commented 6 years ago

@ZmeiGorynych Unfortunately marginals is not enough to compute the n-best sequence taggings.