chokkan / crfsuite

CRFsuite: a fast implementation of Conditional Random Fields (CRFs)
http://www.chokkan.org/software/crfsuite/
Other
647 stars 208 forks source link

Bias term? #60

Closed usptact closed 8 years ago

usptact commented 8 years ago

Sometimes I am using the CRFSuite to do document classification. All the features for a document are simply tucked in a single line where the label is the first token in that line as defined by the format.

In the classic Logistic Regression setup one tries to fit the model by finding the parameters - theta (number of features x number classes) and a bias term. The CRFSuite gives the former matrix of coefficients but no bias term. Is it necessary for classification?

All in all, CRF is just a generalization of Logistic Regression to sequences according to some seminal papers on sequence analysis.

Thanks

kmike commented 8 years ago

@usptact you can add a feature which is 1 for all training examples; that'd be a bias feature. One difference is that usually bias is not regularized, but this feature will be regularized like other features. If that's important you can use values like 100 instead of 1, the effect will be similar to not regularizing bias.

jlerouge commented 8 years ago

@usptact Yes, the bias is important, especially if you have unbalanced classes. Mikhail already told you how to do it.

Independently to your initial question, I wonder what is the point of using a CRF to classify a single document ? You could use other frameworks more specifically designed for this purpose (neural networks, SVMs...).

usptact commented 8 years ago

@kmike @jlerouge Thank you guys for the input! I see now better the place of the bias term. The reason why I am doing this is CRFSuite is now an embedded piece of software in our NLP pipeline. We train many CRF models which do primarily NER but also Chinese segmentation as well as domain classification (classifying short documents). The advantage is that the same tool is used everywhere and it is very fast to train.

usptact commented 8 years ago

@jlerouge Another reason using linear logistic regression is that it perfectly suffices for the task. We found very little gain (if any) using non-linear techniques for short text classification. Short texts here are documents composed of 3-10 words.