ensozos / Matrix-Profile

A Java library for Matrix Profile
https://ensozos.github.io/Matrix-Profile/
MIT License
19 stars 7 forks source link

Issues when trying to update ND4J depenency to 1.0.0-M2.1 #24

Open wolfig opened 1 year ago

wolfig commented 1 year ago

Hi Enzosos,

I am trying to use your library in a Project for tuning an particle ion source (similar to what is show here: Ion Source Optimization Using Bi-Objective Genetic and Matrix-Profile Algorithm). In the paper I used a Python implementation of matrix-profile, but not I want to move the logic to JAVA. One goal in this is to update the dependencies to np4j to version 1.0.0-M2.1, which unfortunately has breaking changes (introduced with nd4j 1.0.0-beta4).

One issue I am facing when running the unit tests with the updated dependency is that all calls of the type

INDArray.get(INDArrayIndex... indexes)

Need to be two dimensional now. This is not a general issue, but when changing the code in MatrixProfileCalculator::MPRunnable::run(), from

    @Override
    public void run() {
        INDArray distanceProfile      = distProfile.getDistanceProfile(timeSeriesA, timeSeriesB, index, window);
        INDArray distanceProfileIndex = distProfile.getDistanceProfileIndex(tsBLength, index, window);

        if (trivialMatch) {
            INDArrayIndex[] indices = new INDArrayIndex[] { NDArrayIndex.interval(
                            Math.max(0, index - window / 2),
                            Math.min(index + window / 2 + 1, tsBLength)) };
            distanceProfile.put(indices, Double.POSITIVE_INFINITY);
        }

        updateProfile(distanceProfile, distanceProfileIndex);
    }

to (NDArrayIndex.all() to be compatible to nd4j 1.0.0-M2.1)

    @Override
    public void run() {
        INDArray distanceProfile      = distProfile.getDistanceProfile(timeSeriesA, timeSeriesB, index, window);
        INDArray distanceProfileIndex = distProfile.getDistanceProfileIndex(tsBLength, index, window);

        if (trivialMatch) {
            INDArrayIndex[] indices = new INDArrayIndex[] { **NDArrayIndex.all(),** NDArrayIndex.interval(
                            Math.max(0, index - window / 2),
                            Math.min(index + window / 2 + 1, tsBLength)) };
            distanceProfile.put(indices, Double.POSITIVE_INFINITY);
        }

        updateProfile(distanceProfile, distanceProfileIndex);
    }

I get errors like

  java.lang.IllegalStateException: Indices are out of range: Cannot get interval index Interval(b=0,e=5,s=1) on array with size(1)=4. Array shape: [1, 4], indices: [all(), Interval(b=0,e=5,s=1)]

for all "testMatrixProfileSelfJoin*" test cases of Matrix profile test. The cause of this error is the fact that distanceProfile.put(...) (calling INDArray.get()) fails, because for the get(...) the IntervalIndex in ìndicesis larger than the distanceProfilearray. This again is caused, because the IntervalIndexis created using the size of tsB which is larger than distanceProfile.

One way to cure this is to change the code to in MatrixProfileCalculator to

    @Override
    public void run() {
        INDArray distanceProfile      = distProfile.getDistanceProfile(timeSeriesA, timeSeriesB, index, window);
        INDArray distanceProfileIndex = distProfile.getDistanceProfileIndex(tsBLength, index, window);

        if (trivialMatch) {
            INDArrayIndex[] indices = new INDArrayIndex[] { **NDArrayIndex.all(),** NDArrayIndex.interval(
                            Math.max(0, index - window / 2),
                            Math.min(index + window / 2 + 1, **distanceProfile.length()**)) };
            distanceProfile.put(indices, Double.POSITIVE_INFINITY);
        }

        updateProfile(distanceProfile, distanceProfileIndex);
    }

I am not sure if this is the right approach as it changes the logic of calculating Matrix-Profile. With this change, the tests do not throw errors any more, but I get assertion failures in tests Windows8, 2SawTeeth, 2Humps, ... ; Windows4, Windows5, StraightLine, Plateau become green.

Could you have a look into this?

wolfig commented 1 year ago

I made some experiments. As a reference-implementation, I assume the Python stumpy library. Furthermore, Furthermore, I assume that your stmp-code is equivalent to stumpy's "stump" method. As test data I used the "targetSeriesWithPattern":

[0.6, 0.5, 2.00, 1.0, -1.01, -0.5, 1.0, 2.3, 4.0, 5.9, 4.2, 3.1, 3.2, 3.4, 2.9, 3.5, 1.05, -1.0, -0.50, 1.01, 2.41, 3.99, 6.01, 4.7, 3.2, 2.6, 4.1, 4.3, 1.1, 1.7, 3.1, 1.9, -0.5, 2.1, 1.9, 2.01, -0.02, 0.48, 2.03, 3.31, 5.1, 7.1, 5.1, 3.2, 2.3, 1.8, 2.1, 1.7, 1.1, -0.1, 2.1, 2.01, 3.9, 3.1, 1.05, -1.0, -0.5, 1.01, 2.41, 3.99, 6.01, 4.7, 4.5, 3.9, 2.1, 3.3, 3.1, 2.7, 1.9]

and calculated the stumpy.stump profile with windows size 8 to be equivalent with your "Windows8" test case:

mp = stumpy.stump(data, m=8)

stumpy.stump produces (this is my reference):

1.978220936260314 0.83707925066267 0.4546809779055928 0.08726639430199337 0.17008405364304846 0.4098865841250271 0.6994474556577408 1.4518582387451167 1.3319005154718184 0.9451213501212021 1.8153057348935029 1.9904369580815713 1.1875958951401007 1.081292397096952 0.5911694103586415 0.16532701476470923 0.0 0.17008405364304846 0.4098865841250271 0.6994474556577408 1.4518582387451167 1.2224427563106743 1.459718781282131 1.9023070773167448 2.288602610987441 1.2224427563106743 1.6558969321997734 1.9023070773167448 2.1472026281338645 1.9278741619147393 1.9850484024035508 2.472148082913442 1.978220936260314 0.83707925066267 0.48804702922152227 0.08726639430199337 0.39649268281634986 0.7430980370551044 1.1167212838495193 1.354934919189077 1.3319005154718184 1.1392798574009926 0.9451213501212021 1.697424487450637 1.9327652893783562 1.33003115179361 1.5756094407644912 2.081034560837073 1.8508390280468399 1.1167212838495193 1.1875958951401007 1.081292397096952 0.5911694103586415 0.16532701476470923 0.0 0.5018020789530718 0.6484523316327658 1.3794821611001897 1.7919132200300565 1.4694559779059273 1.1392798574009926 1.3864057061299813

You provide "expected values in your test case "Windows8". The difference array mp_values - test case_expected looks like this (I set all values below 0.001 to zero...):

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0022, 0.0, 0.0, 0.0, 0.0, -0.8473572436893255, 0.0, -0.03699292268325527, 0.0, -0.5861572436893256, -0.04150306780022661, -0.13219292268325522, 0.0, -0.016525838085260647, 0.0, -0.08145191708655775, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.0022, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

the corresponding difference array mp_values - values_of_my_correction looks like this:

0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.8473572436893255, 0.0, -0.03699292268325527, 0.0, -0.5861572436893256, -0.04150306780022661, -0.13219292268325522, 0.0, -0.016525838085260647, 0.0, -0.08145191708655775, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0

Question is what that means...?