[WIP] Issue #152 use cholesky factorization

andrewcsmith commented 7 years ago

This is basically a gigantic rewrite of the GaussianMixtureModel to make it use cholesky factorization instead of the inverse/determinant model. There's lots wrong with this commit stylistically, but for now I'm just hacking at it.

cargo test -- --nocapture gmm_train

I can't seem to figure out why the means of the model aren't separating. They all seem to drift in the same direction. Any thoughts? Anyway, if you know relatively quickly why that might be happening, I'd like to hear it.

andrewcsmith commented 7 years ago

I think it's getting closer. It doesn't work super well, but my guess is this is due to the initialization parameters and that's because it keeps getting stuck in a local minimum.

If you run the test a bunch of times, you see that occasionally the log_lik drops < -2.0, and the means start to look much more realistic.

I'll work on KMeans initialization, and hopefully that will help things.

andrewcsmith commented 7 years ago

It's pretty clear that adding the k-means initialization step does not help things.

initialized means:
⎡ 0.4962 -0.0198⎤
⎢-0.0147  0.5004⎥
⎣-0.5146 -0.5010⎦

# Final values
means:
⎡-0.0110 -0.0068⎤
⎢-0.0110 -0.0068⎥
⎣-0.0110 -0.0068⎦
log_lik:
-1.4304
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
cov:
⎡0.2872 0.0798⎤
⎣0.0798 0.2844⎦
test learning::gmm::gmm_train ... ok

For my particular application, I switched back to k-means at the moment. I'll look into this weird convergence bug again later, but for the next week or so I need to do other things.

AtheMathmo commented 7 years ago

Thanks for your time on this. I'll take a look through the code and see if I can make sense of anything.

andrewcsmith commented 7 years ago

Just pushed the latest joy. It's good! It works quite well. The test uses k-means initialization by default, and I fixed a bug in the covariance calculation that sealed the deal.

It's still not ready to merge, because the design is pretty bad, but at least now there's an implementation to kick things off. I'll work on getting the gmm integration test to actually test things as well.

andrewcsmith commented 7 years ago

I went ahead and improved a load of other things we discussed. It's pretty solid now. Still occasionally fails, but steadily increasing the regularization constant on each failure usually fixed that.

AtheMathmo commented 7 years ago

I've had a brief look and things look very solid! I'll hopefully have time to properly review this in the next few hours.

Thanks for all the effort you've put into this, I'm really excited to check it out!

AtheMathmo commented 7 years ago

Actually I might have spoken too soon. I just ran a gmm simulation - sampling data from 3 gaussian clusters with specified mean and variances to see if it could recover the parameters. Sadly it consistently failed to do so, but not dramatically.

I'll try to get a PR in shortly to your branch to add this as an example in the repo - it's not the prettiest but seems it will be valuable for us developers and hopefully some users.

andrewcsmith commented 7 years ago

Cool. I didn't really do much in the way of a/b testing here. I'll see if I can tweak the parameters to get this to work better, or otherwise try to figure out what the problem is (probably next week though).

AtheMathmo commented 7 years ago

Hopefully I'll have some time to dig into it in the next few days. It seems pretty minor so hopefully something will jump out.

AtheMathmo commented 7 years ago

I finally got a little time to look at this. I think I checked over everything and compared to the scikit implementation - sadly I couldn't spot anything incorrect. It is failing even in the 1d case (as in the example added) so I don't think it is all of the Cholesky stuff. I'll try to find some time to dig around a little more, but we may just have to step through and compare to a working implementation to spot where things go wrong...

andrewcsmith commented 7 years ago

I bounced back and forth between the CholeskyFull and Diagonal interpretations, and while they're both wrong it seems like the CholeskyFull has generally higher values for the variance, although it also tends to overestimate the variance of the point at a mean of 0.0.

I'm getting fairly consistent results, but they're just not particularly correct. I believe it's external to the impl CovOption blocks. My guess is that it's somewhere in the update code. It seems that the variances are wildly off, which leads me to believe that the calculation of the model responsibilities is off. That's just a guess.

(I'm only commenting because I've been poking around for an hour or so and want to save these notes before moving on with my life. Sorry if it's not particularly helpful.)

AtheMathmo commented 7 years ago

Sat down with this again today.

I've been unable to track down the issue, but agree with you that it is probably the update (or atleast gaussian estimation) code.

I think the best approach to find the issue is to write unit tests for each part of the process using the scikit implementation to help. Basically check that each function is computing what we expect with some dummy input.

The good new is that the new assert_matrix_eq! macro should make this a lot easier to do. I'd like to sit down and get this done but I'm not sure when I'll have time. Maybe by next weekend I could take a look...

AtheMathmo commented 7 years ago

I'm going to try to tackle this again when https://github.com/AtheMathmo/rulinalg/pull/150 is merged.

This should make this work a little easier to complete.

AtheMathmo / rusty-machine

[WIP] Issue #152 use cholesky factorization #155