Closed usptact closed 8 years ago
@usptact you can add a feature which is 1 for all training examples; that'd be a bias feature. One difference is that usually bias is not regularized, but this feature will be regularized like other features. If that's important you can use values like 100 instead of 1, the effect will be similar to not regularizing bias.
@usptact Yes, the bias is important, especially if you have unbalanced classes. Mikhail already told you how to do it.
Independently to your initial question, I wonder what is the point of using a CRF to classify a single document ? You could use other frameworks more specifically designed for this purpose (neural networks, SVMs...).
@kmike @jlerouge Thank you guys for the input! I see now better the place of the bias term. The reason why I am doing this is CRFSuite is now an embedded piece of software in our NLP pipeline. We train many CRF models which do primarily NER but also Chinese segmentation as well as domain classification (classifying short documents). The advantage is that the same tool is used everywhere and it is very fast to train.
@jlerouge Another reason using linear logistic regression is that it perfectly suffices for the task. We found very little gain (if any) using non-linear techniques for short text classification. Short texts here are documents composed of 3-10 words.
Sometimes I am using the CRFSuite to do document classification. All the features for a document are simply tucked in a single line where the label is the first token in that line as defined by the format.
In the classic Logistic Regression setup one tries to fit the model by finding the parameters - theta (number of features x number classes) and a bias term. The CRFSuite gives the former matrix of coefficients but no bias term. Is it necessary for classification?
All in all, CRF is just a generalization of Logistic Regression to sequences according to some seminal papers on sequence analysis.
Thanks