Closed stefan-k closed 7 months ago
Maybe the original author of OWL-QN @vbkaisetsu can help with this.
The objective function of the L1 regularization is different (a regularization term is added), so the best parameter is different in the first place. I would like to check if this behavior is correct in OWL-QN, but may not be able to address this right away.
I tried a grid search for the parameters that minimize the following two functions:
fn rosenbrock(x: f32, y: f32) -> f32 {
(1.0 - x).powi(2) + 100.0 * (y - x.powi(2)).powi(2)
}
fn rosenbrock_with_l1(x: f32, y: f32) -> f32 {
(1.0 - x).powi(2) + 100.0 * (y - x.powi(2)).powi(2) + x.abs() + y.abs()
}
fn main() {
let mut result = (0.0, 0.0, f32::INFINITY);
for x in 0..2000 {
let x = x as f32 / 2000.0;
for y in 0..2000 {
let y = y as f32 / 2000.0;
let z = rosenbrock(x, y);
if z < result.2 {
result = (x, y, z);
}
}
}
dbg!(result);
}
Results:
I think this result is similar to the example.
The objective function of the L1 regularization is different (a regularization term is added), so the best parameter is different in the first place.
Ah yeah, that absolutely makes sense! Now that you mention it, I faintly remember having this confusion already in the past. Sorry about the fuzz and a huge thanks for the quick response! :)
The
owl-qn
example does not converge. Starting from[-1.2, 1.0]
for the Rosenbrock test function, it actually moves away from the optimal point and gets stuck at[0.24, 0.054]
. Regular L-BFGS does not exhibit this behaviour.Output:
This is by using
rosenbrock_derivative
to calculate the derivative. Using finitediff it ends up at a similar point, but much quicker.