accord-net / framework

Machine learning, computer vision, statistics and general scientific computing for .NET
http://accord-framework.net
GNU Lesser General Public License v2.1
4.48k stars 1.99k forks source link

Bug in Singular Value Decomposition #62

Closed krasin-ga closed 9 years ago

krasin-ga commented 9 years ago

There is example of using svd for lsi at http://nlp.stanford.edu/IR-book/html/htmledition/latent-semantic-indexing-1.html

When i'm using SingularValueDecomposition class to compute svd, i get different results. Here is program that reproduces example:

    internal class Program
    {
        private static void Main(string[] args)
        {
            SvdTest();
            Console.ReadLine();
        }

        private static void SvdTest()
        {
            var matrix = new double[,]
                         {
                             {1, 0, 1, 0, 0, 0}, 
                             {0, 1, 0, 0, 0, 0},
                             {1, 1, 0, 0, 0, 0},
                             {1, 0, 0, 1, 1, 0},
                             {0, 0, 0, 1, 0, 1},

                         };
            PrintMatrix(matrix, "Original");

            var svd = new SingularValueDecomposition(matrix,true,true,true,true);

            var u = svd.LeftSingularVectors;
            PrintMatrix(u, "U");

            var s = svd.DiagonalMatrix;
            PrintMatrix(s, "S");

            var vt = svd.RightSingularVectors.Transpose();
            PrintMatrix(vt, "Vt");

            var powerOfsigma = svd.Diagonal.Length;
            const int reducedPower = 2;

            var toRemove = powerOfsigma - reducedPower;

            var indexes = GetIndexes(powerOfsigma, toRemove);

            var semanticSpace = s.Remove(indexes, indexes)
                .Multiply(vt.Remove(indexes, null));

            PrintMatrix(semanticSpace, "reduced document semantic space model");
        }

        private static int[] GetIndexes(int value, int nTimes)
        {
            return IterateFromValue(value, nTimes).ToArray();
        }

        private static IEnumerable<int> IterateFromValue(int value, int nTimes)
        {
            for (int i = 0; i < nTimes; i++)
                yield return --value;
        }

        private static void PrintMatrix(double[,] doubles, string name)
        {
            Console.WriteLine(name);
            for (int i = 0; i < doubles.GetLength(0); i++)
            {
                for (int j = 0; j < doubles.GetLength(1); j++)
                    Console.Write(doubles[i, j].ToString("0.00") + "\t");
                Console.WriteLine();
            }
            Console.WriteLine();
        }
}
cesarsouza commented 9 years ago

Hi Krasin,

I still couldn't take a look on the issue, but most likely the code in the framework is correct. The fact is that the SVD is not unique, and different implementations may produce different (but still valid) results. Another thing is that it could be possible that the framework and the document you are reading are not following the same definitions for the output of the SVD. For example, some of the result matrices in the LeftSingularVectors or RightSingularVector properties might been transposed by the framework.

I saw you closed the issue, did you reach any conclusion about it?

In any case, many thanks for creating the issue! If indeed the framework's code is correct, I will leave it open so it can remind me to improve the current documentation. Thanks!

krasin-ga commented 9 years ago

Hi Cesar. Thanks for reply and great framework. I closed it after more research on svd computation, that lead me to conclusion that framework code is correct. However, i ended up using svdlibc for my needs (large sparse matrix ~200000x1000000, top N dimensions)