Improvements to linear models

AtheMathmo commented 8 years ago

The current techniques used in the linear models are sub-par and numerically unstable. There are better techniques available that we should use. This is most notably an issue for the GLM module - but improvements can be made in the linear model and logistic model as well (which are of course special cases).

The current implementations are as follows:

Linear regression : compute (X.t * X)^-1 * X.t * y
Logistic regression : we use gradient descent in the logistic_reg module
General Linear Models : we use a naive IRLS implementation which again computes inverses directly.

We should focus on improving GLM's as the issues crop up very often here. The GLMs, abridged article provides a great overview of GLMs in R - we can make good use of this as a resource for rusty-machine. Rulinalg provides all the techniques we need - though of course without the performance given by BLAS/LAPACK.

For linear regression we can also make use of better techniques to solve the least squares problem. More research needed here.

Finally we should add Regularization to the linear models. The learning::toolkit::regularization model should do most of the work for us - though this was designed for neural nets and may need some changes.

angusturner commented 8 years ago

I'd like to help!

I'm quite new to ML, but would like to have a stab at Levenberg-Marquardt optimisation (or similar). At a glance, the maths doesn't seem too difficult, so if this sounds like a useful contribution I'll get started on it. From what I understand, this could significantly outperform vanilla GD on linear and logistic regression (and neural net).

AtheMathmo commented 8 years ago

Thank you for your comment! I hadn't heard of the Levenberg-Marquardt algorithm before you suggested it here. It looks like it could be a valuable addition to the linear and logistic regression models. However from my own reading I'm not sure whether this would be a stable alternative in some of the more tricky GLMs (as described in the aforementioned article). That said - if you're happy to try and get this implemented I would be glad to review it!

As a final thought. It might be worth trying to be generic over the underlying algorithm. This would make it easy for us to compare them across different models and provide options to our users for those cases where different algorithms are needed. This is probably quite a big task which we can put off for now.

Edit: For other readers, here is an implementation promoting viability.

phasedchirp commented 7 years ago

I'm pretty new to Rust, but would be interested in contributing if I'm able. For linear regression, one option is using one of the orthogonal decomposition-based methods, as described here or here, with a bit on performance and numerical stability for QR and Cholesky methods here here.

So, if that's an option of interest, I'd be interested in at least trying to implement something.

AtheMathmo commented 7 years ago

Hey @phasedchirp !

I think this sounds like a promising approach. If you have the time it would be great to see an implementation of this (alongside some benchmarks and accuracy comparisons, where possible). I'm fairly busy over the next few weeks but I'll do my best to be available for any questions you might have. The gitter channel is probably the best place but feel free to comment here too.

phasedchirp commented 7 years ago

@AtheMathmo Cool. I've got a rough first pass using the QR method, which seems to be working not terribly (passes test_regression, fails test_regression_datasets_trees, but with differences on the scale of less than 1e-11 for parameters and predictions), but it will need a lot of cleaning up, so I'll work on that/putting together appropriate tests/comparisons.

nickjmeyer commented 7 years ago

As with @phasedchirp, I'm new to Rust (cloned some repos and that's about it :)) . But anxious to contribute and I think a statistics / machine learning library might be my best bet.

For linear regression using least-squares, QR and Cholesky is usually the way to go with some special cases for certain matrix structures. I believe this is what the \ operator does in Julia.

robsmith11 commented 7 years ago

How much worse in accuracy do you think the current implementation of logistic regression should be vs. R's glm()? I just trained a model with 7 features and 20 million samples, and the calibration of the results seems to be far worse than R. Is that to be expected for now or should I investigate further?

AtheMathmo / rusty-machine

Improvements to linear models #145