chokkan / crfsuite

CRFsuite: a fast implementation of Conditional Random Fields (CRFs)
http://www.chokkan.org/software/crfsuite/
Other
647 stars 208 forks source link

Restrict tags for each item #72

Open y-bo opened 8 years ago

y-bo commented 8 years ago

Is there a way to restrict possible set of tags for each item? For example, I want to do Morphological Disambiguation, so for each word there is a small set of possible tags (from dictionary), as opposed to all possible tags for all words.

usptact commented 8 years ago

Perhaps I don't understand your problem but to me CRF is about what tags are possible for each item given some training set. In other words, your model learns what are the most likely tags at each item in the sequence. Finally, a priori one does not generally know at which element are the likely tags.

kmike commented 8 years ago

@usptact the reason to restrict a number of tags is efficiency: you may have 1000 tags in your tag set, but only 5 possible tags for an item according to a dictionary. Time complexity is O(N^2) regarding to a number of tags, so the effect can be pretty large.

usptact commented 8 years ago

@kmike Thanks for the clarification! I see the point now. For this to work, if I understand correctly, one must know at which items to restrict the possible tag set.

y-bo commented 8 years ago

@usptact Yeah, in my case for each item there is a set of features which contains possible tag set.