Closed sgbaird closed 3 years ago
Still curious if you have thoughts on this
Hi Sterling,
Sorry I missed your previous comment. Just as other regularized optimization problems, It is very hard to say how to choose the right regularization parameter lambda.
Usually, a good practice is to choose lambda so that the first term ( |C X|^2 ) and second term ( |lambda B X|^2 ) have similar magnitudes when the objective function is at minimum. Another way is to choose the lambda so that, after minimization, the magnitude of |B X|^2 is close to your estimation of experimental error.
As you can imagine, the first approach is easier than the second, and it is used in my code. I remember it is approximately the number of C rows divide by the number of B rows. In fact, using a larger lambda doesn't change the result a lot in my testing, I guess the experiment error is averaged out somehow.
In conclusion, 0.01,0.1,1 would be too small (I don't think your B would have more rows than C). I recommend using a larger lambda, it doesn't show problems in my test.
Yufeng
BTW, I don't think there is a single correct answer regarding to how to choose the hyperparameters in a model. It can be a research topic by itself.
Great. Thanks for this. I was having trouble with 5DOF interpolation results on some experimental data. I also agree, choosing hyperparameters can get pretty tricky. This gives me some great info to go off of. Thank you.
Hi Yufeng,
The paper's appendix talks about how for a large enough value of lambda, the "optimization problem approximately becomes:", and that using a smaller lambda can help with noisier datasets. Any comments on a minimum regularization strength that can be used without the approximation breaking down? For example, would e.g. 0.01, 0.1, 1, etc. be considered "too small"?
Sterling