evelinag / Ariadne

F# library for Gaussian process models
http://evelinag.com/Ariadne
Apache License 2.0
45 stars 10 forks source link

Overflow in GaussianProcess.Predict() when Using Large Dataset #3

Open squirvel opened 3 years ago

squirvel commented 3 years ago

I am working with a rather large sub-set of data (922854 location-position pairs). When attempting to do GuassianProcess.Predict() an integer overflow occurs, which results in an index out of bounds error. (Specifically, the index is being set to -2146233066).

I have narrowed in this issue starting somewhere between 45,000 and 50,000 pairs, with the library running and producing results without error when in the valid range.

A trace of the error is listed below:

   at MathNet.Numerics.LinearAlgebra.Storage.DenseColumnMajorMatrixStorage`1.OfInit(Int32 rows, Int32 columns, Func`3 init)
   at MathNet.Numerics.LinearAlgebra.MatrixBuilder`1.Dense(Int32 rows, Int32 columns, Func`3 init)
   at Ariadne.GaussianProcess.covarianceMatrix[T](FSharpFunc`2 kernel, T[] input1, T[] input2)
   at Ariadne.GaussianProcess.GaussianProcess`1.PosteriorGaussianProcess(IEnumerable`1 data, T[] newLocations)
   at Ariadne.GaussianProcess.GaussianProcess`1.Predict(IEnumerable`1 data, T[] newLocations)
   at package.Program.function(params) in C:\Users\squirvel\package\Program.cs:line xxx

I am calling the library from some C# code, so I am unsure if that would affect anything. I do not think that that is the case, but for the sake of completion, here is what my code looks like:

// Generate gaussian process
// start with covariance function. Define params first
float lengthscale = 0.5F;
float signalVariance = 0.3F;
float noiseVariance = 0.5F;

// Setup kernel fuction
Ariadne.Kernels.SquaredExp.SquaredExp kernelFunction = new Ariadne.Kernels.SquaredExp.SquaredExp(lengthscale, signalVariance, noiseVariance);
// Use kernel function to spawn process
Ariadne.GaussianProcess.GaussianProcess<double> gaussianProcess = kernelFunction.GaussianProcess();

// Points is an object I am unpacking values into these items from
double[] pointTimes = new double[points.Count];
double[] xPoints = new double[points.Count];
// unpacking of values into above variables occurs here

// Generate Observation sets
int SIZE = 45000;  // Testing variable to help isolate around when the bug occurs
Ariadne.GaussianProcess.Observation<double> xObservations = new Ariadne.GaussianProcess.Observation<double>(pointTimes[0..SIZE], xPoints[0..SIZE]);
Ariadne.GaussianProcess.Observation<double> yObservations = new Ariadne.GaussianProcess.Observation<double>(pointTimes[0..SIZE], yPoints[0..SIZE]);

// Turn into enumeratble objects (required for function signature)
List<Ariadne.GaussianProcess.Observation<double>> xEnumerable = new List<Ariadne.GaussianProcess.Observation<double>>
{
     xObservations
};

// Attempt predictions
// TIMESTAMPS is pre-defined, and is the set of locations I am attempting to get predictions for, and is 10 locations
Tuple<double[], double[]> xResults = gaussianProcess.Predict(xEnumerable, TIMESTAMPS);  // IndexOutOfBounds error occurs here

As it can be seen in the trace, this is not specifically occurring in Ariadne, but rather the underlying MathNet.Numerics library. Specifically in LinearAlgebra.MatrixBuilder

My current speculation is that: The list contained in xEnumerable is getting casted to an array. Because the MathNet library would be pre-compiled, likely without gcAllowVeryLargeObjects being enabled, we are literally running out of space, and overflowing. This could be avoided if we can work directly with lists rather than Int32 arrays.

However I am not sure if that is the case, and if were, I have not been able to find code that hints at such a casting occurring.

squirvel commented 3 years ago

Actually, I think I might have found what I was looking for... Should have a patch to push sometime in the next day or two

squirvel commented 3 years ago

Update, this is in fact an issue with Math.Net.

Specifically, the constructor:

internal DenseColumnMajorMatrixStorage(int rows, int columns)
            : base(rows, columns)
        {
            Data = new T[rows*columns];
        }

in the DenseColumnMajorMatrixStorage.cs fails when multiplying rows*columns.

This cannot be worked around by compiling math.net and Ariadne as 64bit, as the C# int definition (even on 64 bit platforms) is specifically noted as being 32bits.

If the ints in math.net were 64bit and largeObjects were enabled (which LargeObjects is when forcing it to compile as 64bit) than this error would not exist. (Better yet, using BigInteger and Lists would ensure no size limits at all, without the 64bit requirement).

That said, I did write up a patch which swaps out the arrays for F# lists. (With the exception of the hyperparameter arrays.) I couldn't get the tests to run on my machine, but it seems to be running properly. Assuming Math.Net had BigIntegers and Lists over arrays, that patch might be useful. Regardless, if anyone would like me to post the patch, just let me know.

For now, I guess I will have to Guassian process a a set of Gaussian process outputs, or just lazily average them.

squirvel commented 3 years ago

Re-opened. No good reason to close just because this library isn't the root cause of the problem.