EdwardRaff / JSAT

Java Statistical Analysis Tool, a Java library for Machine Learning
GNU General Public License v3.0
788 stars 204 forks source link

OrdinaryKriging Infiniy #97

Open MichaelTenma opened 2 years ago

MichaelTenma commented 2 years ago

In OrdinaryKriging.train(), when lup.det() is infinity that the LUPDecomposition will get a wrong X. However, I can get a right X when using singular value decomposition. So, I think it should be uesd SingularValueDecomposition instead of LUPDecomposition while lup.det() is infinity. Currently, just when lup.det() is NaN or very close to zero will use SingularValueDecomposition.

EdwardRaff commented 2 years ago

HI Michael, do you have an example of this problem you could share by chance?

Either way, it sounds like adding an infinity check to this line fixed it for you - is that correct?

MichaelTenma commented 2 years ago

Sorry, I could not provide the test data for you. The problem happen when I apply OrdinaryKriging for soil nutrient interplating, and I use the 3857 projected coordinate system, and the position near (12900000, 2850000).

I guess it might overflow while calculating lup.det(), and I use near 500 sample points to interplate soil nutrient.

Currently, I fix this problem by appending a condition:

if(Double.isInfinite(lup.det()) || Double.isNaN(lup.det()) || Math.abs(lup.det()) < 1e-5) {
    SingularValueDecomposition svd = new SingularValueDecomposition(V);
    X = svd.solve(Y);
}

And I also shift all the sample points to a original point by minus original point vector such as (12900000, 2850000), it is an operator to avoid overflowing. I got a better result in RMSE or MAE in test set than the old code in OrdinaryKriging. However, I could not prove it theologically.

public class FixOrdinaryKriging implements Regressor, Parameterized {
    private Vec firstOriginalVec = null;
    @Override
    public double regress(DataPoint data)
    {
        Vec x = data.getNumericalValues();
        if(this.firstOriginalVec != null){
            x = x.subtract(firstOriginalVec);
        }

       /* same as OrdinaryKriging */
        int npt = X.length()-1;
        double[] distVals = new double[npt+1];
        for (int i = 0; i < npt; i++)
            distVals[i] = vari.val(x.pNormDist(2, dataSet.getDataPoint(i).getNumericalValues()));
        distVals[npt] = 1.0;

        return X.dot(toDenseVec(distVals));
    }

    @Override
    public void train(RegressionDataSet dataSet, ExecutorService threadPool)
    {
        dataSet = dataSet.shallowClone();
        if(this.firstOriginalVec == null){
            this.firstOriginalVec = dataSet.getDataPoints().get(0).getNumericalValues().clone();
        }

        if(this.firstOriginalVec != null){
            for(DataPoint dataPoint : dataSet.getDataPoints()){
                dataPoint.getNumericalValues().mutableSubtract(this.firstOriginalVec);
            }
        }

        /* same as OrdinaryKriging */
        /* same as OrdinaryKriging */

        if(Double.isInfinite(lup.det()) || Double.isNaN(lup.det()) || Math.abs(lup.det()) < 1e-5)
        {
            SingularValueDecomposition svd = new SingularValueDecomposition(V);
            X = svd.solve(Y);
        }
    }
}

Sorry with my terrible English.