Shared Covariance Matrix for Gaussian Mixture Models

franktore commented 8 years ago

Hi, Is there a way to make the gmm use a shared covariance matrix, i.e. indicate that all components have the same covariance matrix. This is a feature of the Matlab gmm implementation and it restricts the model to be more similar to KMeans. I assume that what Matlab is calling shared covariance matrix is really the pooled covariance matrix.

franktore commented 8 years ago

I believe I have realized a way to achieve this. By first performing a KMeans clustering, then creating a MultivariateMixture using pooled covariance estimated from the KMeans covariances and finally initializing the GaussianMixtureModel with the MultivariateMixture. I use the following snippet to calculate the pooled covariance:

private double[][] PooledCovariance(double[][][] covariances, double[] n)
{
    int nComponents = n.Length;
    int count = (int)Math.Round(n.Sum(x => x - 1));
    double[][] CovPool = new double[covariances[0].Length][];
    for (int i = 0; i < CovPool.Length; i++)
    {
        CovPool[i] = new double[covariances[0][0].Length];
    }

    double weight = 0;
    for (int i = 0; i < nComponents; i++)
    {
        weight = (n[i] - 1) / count;
        CovPool = CovPool.Add(covariances[i].Multiply(weight));
    }

    return CovPool;
}

franktore commented 8 years ago

By closer examination it would seem that the previous suggestion does not hold. Rather I would need to recalculate the shared covariance, and presumably also the cholesky decomposition, after each EM iteration and use this to update the pdf's. This is currently not possible as I see it, since the IFittableDistribution interface does not reveal the covariance matrix. It could be done by initializing a new MultivariateNormalDistribution and substitute this with the current pdf, although I am not sure about the consequencies througout the different projects.

cesarsouza commented 8 years ago

Hi there,

Many thanks for reporting the issue! I think the best way to add support for this would be to add a new parameter to NormalOptions to indicate that a particular covariance matrix (or Cholesky decomposition) should be used in the distribution (and not estimated from the data). Then, the pooled covariance matrix could be have been computed in GaussianMixtureModel.Learn(x) and then stored in the NormalOptions object, before it is passed along to the Fit() method of each MultivariateNormalDistribution.

However, I have to say that if you want the model to behave more like k-means I suppose what you need is to force the model to have identity matrices as covariances. In this case, what can be done is to add a new parameter to NormalOptions specifying that the covariance matrices should not be updated inside the .Fit() method.

franktore commented 8 years ago

Hi Cesar, thanks for replying. To quote matlab on Shared covariance they say:

Shared covariance matrices indicate that all components have the same covariance matrix. All ellipses are the same size and have the same orientation. This specification is more parsimonious than the unshared specification because the total number of parameters only increases by the number of covariance parameters for one component.

i.e. it is not to obtain a different way of calculating KMeans, but rather to restrict the form of the resulting clusters. I agree that NormalOptions would be a perfect place for this, but it has to be done in the iterative part of the algorithm, i.e. in the ExpectationMaximization's (EM) compute() method, since the matrix must be recalculated and set for each calculation. But the EM only sees the IFittingOptions and IFittableDistribution interfaces. A possibility is to place a boolean "SharedCovariance" in NormalOptions and have the EM compute() method check this value by casting InnerOptions to NormalOptions, then calculate the shared covariance inside EM and create new MultivariateNormalDistribution with desired input. The option could also be placed directly in the EM class instead of using NormalOptions, which would make sense as the only change of action occurs inside of EM, but then we would need a separate test to discover what kind of distribution to instantiate.

As a sidenote, can you explain why the weighted covariance matrix is not normalized by the sample size, as is done in the unweighted covariance case? Instead a factorization is done with a factor derived from the sum of weights.

cesarsouza commented 8 years ago

Many thanks, now I see the issue. Then maybe a possibility to achieve this without changing the IFittableDistribution interface and/or the EM class would be to store the pooled variance matrix in the NormalOptions, and compute it in the NormalDistribution Fit() method inside a lock() { } block. The first component distribution that passes through that section and sees that the pooled variance has not been computed yet would compute and store it in the NormalOptions, and the others would simply re-use it. Do you think this could work?

About the covariance matrix: when computing a weighted covariance matrix where the weights are given as frequencies, the information about the sample size has been practically lost. For example, let's say that we have a sample size of 1000 but where just two samples have weight 0.5 and the rest are all 0. While theoretically there would be 1000 samples globally, in practical terms we would only have 2 samples to estimate the covariance matrix. As such, instead of dividing by the sample size, the WeightedCovariance function divides it by the sum of weights (see here for an example where the covariance matrix is divided by the sum of weights - the only difference is that the framework is computing the unbiased weighted estimation by default instead of the population estimate).

franktore commented 8 years ago

I wish it could be so elegant, trouble is that to estimate pooled variance one must use the covariances, or at least the means and weights of all distributions involved. I can't really see any other place to do this than in the EM compute() method itself. After making the fit the new values can be used to update the covariance matrices and correpsonding cholesky decomps for each distribution. The pooled covariance will only be computed once per iteration, after performing all the fits. The pdf's will then be replaced by new distributions initialized with the fitted means and the pooled covariance. I guess a property in EM (and LogEM), to indicate whether to perform this step would be easiest way. The difference with matlab is that they do not parallelize the fitting so they can update the covariance matrix incrementally.

Many thanks for the breakdown of weighted covariance. I see now that the computed "factor" will act as the inverse of the effective number of data points assigned to the component.

franktore commented 8 years ago

Here are the changes I've made to allow this feature:

private double compute(TObservation[] observations, double[] weights)
{
     // Estimation parameters
     double[] coefficients = Coefficients;
     var components = Distributions;

    double weightSum = 1;
    if (weights != null)
        weightSum = weights.Sum();

    // 1. Initialize means, covariances and mixing coefficients
    //    and evaluate the initial value of the log-likelihood

    int N = observations.Length;

    // Initialize responsibilities
    double[] norms = new double[N];
    for (int k = 0; k < Gamma.Length; k++)
        Gamma[k] = new double[N];

    // Clone the current distribution values
    double[] pi = (double[])coefficients.Clone();
    var pdf = new IFittableDistribution<TObservation>[components.Length];

    for (int i = 0; i < components.Length; i++)
        pdf[i] = (IFittableDistribution<TObservation>)components[i];

    int observationDim = 1;
    if (typeof(TObservation).IsArray)
        observationDim = (observations[0] as Array).Length;

    // Check whether to apply pooled covariance matrix
    double[,] SPool = null;
    if (InnerOptions != null && InnerOptions is NormalOptions)
    {
        if ((InnerOptions as NormalOptions).SharedCovariance)
        {
            SPool = Matrix.Create(observationDim, observationDim, 0.0);
        }
    }

    // Prepare the iteration
    Convergence.NewValue = LogLikelihood(pi, pdf, observations, weights, weightSum);

    // Start
    do
    {
        // 2. Expectation: Evaluate the component distributions 
        //    responsibilities using the current parameter values.
         Parallel.For(0, Gamma.Length, k =>
        {
            double[] gammak = Gamma[k];
            for (int i = 0; i < observations.Length; i++)
                gammak[i] = pi[k] * pdf[k].ProbabilityFunction(observations[i]);
        });

        Parallel.For(0, norms.Length, i =>
        {
            double sum = 0;
            for (int k = 0; k < Gamma.Length; k++)
                sum += Gamma[k][i];
            norms[i] = sum;
        });

        try
        {
            Parallel.For(0, Gamma.Length, k =>
            {
                double[] gammak = Gamma[k];

                for (int i = 0; i < gammak.Length; i++)
                    gammak[i] = (norms[i] != 0) ? gammak[i] / norms[i] : 0;

                if (weights != null)
                {
                    for (int i = 0; i < weights.Length; i++)
                        gammak[i] *= weights[i];
                }

                double sum = gammak.Sum();

                if (Double.IsInfinity(sum) || Double.IsNaN(sum))
                    sum = 0;

                // 3. Maximization: Re-estimate the distribution parameters
                //    using the previously computed responsibilities

                pi[k] = sum;

                if (sum == 0)
                    return;

                for (int i = 0; i < gammak.Length; i++)
                    gammak[i] /= sum;

                pdf[k].Fit(observations, gammak, InnerOptions);
            });
        }
        catch (AggregateException ex)
        {
            if (ex.InnerException is NonPositiveDefiniteMatrixException)
                throw ex.InnerException;
        }

        if (SPool != null)
        {
            SPool = PooledCovariance(pdf, pi, observationDim);
            for (int k = 0; k < pdf.Length; k++)
            {
                pdf[k] = (IFittableDistribution<TObservation>)
                    new Multivariate.MultivariateNormalDistribution((pdf[k] as Multivariate.MultivariateNormalDistribution).Mean, SPool);
            }
        }

        double sumPi = pi.Sum();
        for (int i = 0; i < pi.Length; i++)
            pi[i] /= sumPi;

        // 4. Evaluate the log-likelihood and check for convergence
        Convergence.NewValue = LogLikelihood(pi, pdf, observations, weights, weightSum);

    } while (!Convergence.HasConverged);

    double newLikelihood = Convergence.NewValue;
    if (Double.IsNaN(newLikelihood) || Double.IsInfinity(newLikelihood))
        throw new ConvergenceException("Fitting did not converge.");

    // Become the newly fitted distribution.
    for (int i = 0; i < pi.Length; i++)
        Coefficients[i] = pi[i];

    for (int i = 0; i < pdf.Length; i++)
        Distributions[i] = pdf[i];

    return newLikelihood;
}

public static double[,] PooledCovariance(IFittableDistribution<TObservation>[] distributions, double[] weights, int dim)
{
    double weightSum = weights.Sum();
    double[,] pooledCov = Matrix.Create<double>(dim, dim, 0);
    for (int k = 0; k < distributions.Length; k++)
    {
        Multivariate.MultivariateNormalDistribution dist = distributions[k] as Multivariate.MultivariateNormalDistribution;
        pooledCov = pooledCov.Add(dist.Covariance.Multiply(weights[k]));
    }

    if (weightSum != 0)
    {
        pooledCov = pooledCov.Divide(weightSum);
    }

    return pooledCov;
}

In addition to adding the SharedCovariance property to NormalOptions. So far it seems to be working as intended, but more testing is required.

cesarsouza commented 8 years ago

Hi Franktore,

Many thanks for the contribution! I still would like to try a few ideas to try to make it more generalizable, or at least to avoid the need for testing for specific fitting option types from the EM class, but certainly it will be a simple refactoring from the code you just shared. Thanks a lot!

If you find issues, or have further suggestions, please let me know!

Regards, Cesar

franktore commented 8 years ago

Hi Cesar, it is a pleasure to give my thoughts and ideas to support this much appreciated project. On that note I have discovered what appears to be a small bug in the MultivariateNormalDistribution class. In the Fit() method I see that when computing unweighted covariance it will always default to full matrix even when options.Diagonal is set.

// Compute covariance matrix
if (options != null && options.Diagonal)
    cov = Matrix.Diagonal(Measures.Variance(observations, means));
cov = Measures.Covariance(observations, means).ToMatrix();

I'm assuming an else-statement has been lost there.

Best regards, Frank

cesarsouza commented 7 years ago

Integrated in release 3.4.0.

accord-net / framework

Shared Covariance Matrix for Gaussian Mixture Models #320