DomainError: determinant is negative

sbos commented 10 years ago

When I run EM algorithm on my data I suddenly get errors like this:

ERROR: DomainError: determinant is negative
 in logdet at ./linalg/lu.jl:137
 in llpg at /home/sbos/.julia/v0.3/GaussianMixtures/src/train.jl:268
 in posterior at /home/sbos/.julia/v0.3/GaussianMixtures/src/train.jl:301
 in fullstats at /home/sbos/.julia/v0.3/GaussianMixtures/src/stats.jl:87
 in stats at /home/sbos/.julia/v0.3/GaussianMixtures/src/stats.jl:32
 in stats at /home/sbos/.julia/v0.3/GaussianMixtures/src/stats.jl:141
 in em! at /home/sbos/.julia/v0.3/GaussianMixtures/src/train.jl:193
 in GMM2 at /home/sbos/.julia/v0.3/GaussianMixtures/src/train.jl:115
 in GMM at /home/sbos/.julia/v0.3/GaussianMixtures/src/train.jl:31

Another strange thing is that per-point log-likelihoods printed by history function are always positive, starting from zero on the first iteration and not changing during subsequent iterations as well as number of iteration.

davidavdav commented 10 years ago

This looks like a serious error. I get it that you estimate a full covariance model through splitting? What happens if you use kmeans initialization?

sbos commented 10 years ago

That worked better and my code could move further but then I got another error:

ERROR: Covariance 1 not positive definite  
 in error at ./error.jl:21  
 in GMM at /home/sbos/.julia/v0.3/GaussianMixtures/src/gmmtypes.jl:143  
 in GMM at /home/sbos/.julia/v0.3/GaussianMixtures/src/gmmtypes.jl:60  
 in GMMk at /home/sbos/.julia/v0.3/GaussianMixtures/src/train.jl:99  
 in GMM at /home/sbos/.julia/v0.3/GaussianMixtures/src/train.jl:31  
 in include212 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so  
 in include_from_node1 at ./loading.jl:128  
 in process_options1743 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so  
 in _start1731 at /usr/bin/../lib/x86_64-linux-gnu/julia/sys.so (repeats 2 times)

davidavdav commented 10 years ago

This is pretty weird. Those covariances are generated by cov() from Base. They should surely be positive definite.

Ehm... what is the size and orientation of your data? Is it degenerate in some kind? The index over data points should be the first in the array, the index over dimension the second. This may be different from how you store your data.

sbos commented 10 years ago

Well I tried to align my data with format mentioned in readme. data[i, :] is i-th datapoint and data[:, j]is j-th dimension for all points, is it right?

Initialization breaks on 198 100-dimensional objects (I model 2 gaussians). Here is the dataset I'm working with https://gist.github.com/sbos/018955227319ead18529

davidavdav commented 10 years ago

OK, thanks, yes, well, you have very few data points (only 198) and a high dimensionality (100). For a full covariance matrix you need at least as many data points as you have dimensions (but preferrably many more), and with even only two gaussians you already have too few data points to fill both covariance matrices with data in a sensible way.

I should put in some checking that after k-means initialization there are enough data points for every Gaussian.

The best I can do for you is diagonal covariance, as in:

gd = GMM(2, x, kind=:diag)

Then, if you want to see full covariance happening you can generate some extra data

xx = rand(gd, 10000) gf = GMM(2, xx, kind=:full)

Further, it is advisable to normalize the data in some way, e.g., to zero mean, unit variance, as you can do with MFCC.znorm() (this is a wrong name, I realize now), or StatsBase.zscore(x, 1). This explains why you have very high average log likelihood. If variance(x) -> zero, then the likelihood of the model goes up without bounds.

sbos commented 10 years ago

Oh, shame on me, of course 198 points is not enough! Last time I worked with Gaussian mixture I did full inference with Normal-Wishart prior so that wasn't a problem. Thank you for your help.

davidavdav commented 10 years ago

OK, well, if you want to give it a try, you can do that (I believe) using the variational Bayes support (this is all quite new---I've been working hard on that the last month).

gd = GMM(2, x, kind=:diag)
prior = GMMprior(gd.d, 0.1, 1.)
vg = VGMM(gd, prior) ## initialize a variational Bayes GMM
em!(vg, x, nIter=10)
history(vg) ## one gaussian was dropped, not much of a mixture!
gf = GMM(vg) ## now we don't have non-posdef covariances anymore
avll(gf, x)

sbos commented 10 years ago

Great, I think I will try this. I have my own gaussian mixture implementation https://github.com/sbos/DPMM.jl , but it is currently coupled with Dirichlet Process code which is not what I actually need right now, so I'm looking for alternatives.

On 10 November 2014 13:40, David van Leeuwen notifications@github.com wrote:

OK, well, if you want to give it a try, you can do that (I believe) using the variational Bayes support (this is all quite new---I've been working hard on that the last month).

gd = GMM(2, x, kind=:diag) prior = GMMprior(gd.d, 0.1, 1.) vg = VGMM(gd, prior) ## initialize a variational Bayes GMMem!(vg, x, nIter=10)history(vg) ## one gaussian was dropped, not much of a mixture! gf = GMM(vg) ## now we don't have non-posdef covariances anymoreavll(gf, x)

— Reply to this email directly or view it on GitHub https://github.com/davidavdav/GaussianMixtures.jl/issues/4#issuecomment-62367483 .

davidavdav / GaussianMixtures.jl

DomainError: determinant is negative #4