AlexanderFabisch / gmr

Gaussian Mixture Regression
https://alexanderfabisch.github.io/gmr/
BSD 3-Clause "New" or "Revised" License
168 stars 49 forks source link

Problem with NaN in cholesky decomposition #7

Closed dmronga closed 4 years ago

dmronga commented 4 years ago

Hi,

When I fit a GMM to data, I sometimes get the following error:

Traceback (most recent call last): File "test_nan_problem.py", line 9, in model.from_samples(frame.values) File "build/bdist.linux-x86_64/egg/gmr/gmm.py", line 94, in from_samples File "build/bdist.linux-x86_64/egg/gmr/gmm.py", line 160, in to_responsibilities File "build/bdist.linux-x86_64/egg/gmr/mvn.py", line 105, in to_probability_density File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 81, in cholesky check_finite=check_finite) File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_cholesky.py", line 20, in _cholesky a1 = asarray_chkfinite(a) File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1022, in asarray_chkfinite "array must not contain infs or NaNs") ValueError: array must not contain infs or NaNs

The occurence of the error depends on the data and on the GMM parameters. E.g. it may happen that, with the same data, the error does not occur when I use a different random state.

I am working on master branch, with python2.7 and numpy version 1.11.0. The code to reproduce the error is

import pandas as pd
from gmr import GMM
import random, time

frame=pd.read_csv("data.txt", sep=" ")
random_state = 1578569639
model = GMM(n_components=7, random_state=random_state)
model.from_samples(frame.values)

I attached the data file: data.txt

Best, Dennis

AlexanderFabisch commented 4 years ago

One of the Gaussians seems to be to far off from the data. In this case it makes sense to reinitialize it. I will take a look. A quick fix is this:

model.from_samples(frame.values, n_iter=10)
AlexanderFabisch commented 4 years ago

Should be fixed with #8