Open shyamupa opened 8 years ago
Your suggestion is valid.
But, I don't see an issue here. He can always set b1 = 0, then there is only a single bias term to tune.
Also, when there are many classes, you need a bias term for each class. Do I understand the question correctly?
A student came with the following problem.
He is doing binary classification using the multiclass code. The issue is that he wants to tune his bias on the dev data instead of using the learnt one. The problem is, because of the
FeatureVectorBuffer
shift, he now has two bias parameters, one for each class.w1 x + b1 > w2 x + b2
Tuning this is harder than tuning a single
b
. I am thinking of suggesting to turn off both the learnt bias, and then impose your single bias at test time. Is there a better solution @KaiWeiChang, this looks cumbersome when dealing with many classes?? Do we want to change howshift
behaves.