ensozos / Matrix-Profile

A Java library for Matrix Profile
https://ensozos.github.io/Matrix-Profile/
MIT License
19 stars 7 forks source link

MatrixProfile.stamp does not give deterministic results for self-join case #4

Open barrybecker4 opened 5 years ago

barrybecker4 commented 5 years ago

For the versions of stamp and stmp that take a query parameter, I see that the results are the same between them. However for the self-join case (no query provided), then the results for stamp are non-deterministic. I'm sure that this has to do with Random not using a seed, but shouldn't the result be the same even if the indices are processed in a different order each time?

When I run this test, I get different results for the second part of the pair nearly every time.

   INDArray shortTargetSeries = Nd4j.create(new double[]{0.0, 6.0, -1.0, 2.0, 3.0, 1.0, 4.0}, new int[]{1, 7});

   @Test
    public void testMatrixProfileSelfJoinStampWindow4() {
        int window = 4;
        Pair<INDArray, INDArray>expectedResultWhenSelfJoin = new Pair<>(
                Nd4j.create(new double[]{1.7308, POSITIVE_INFINITY, POSITIVE_INFINITY, 1.7308}, new int[]{1, 4}),
                Nd4j.create(new double[]{3.0000,    2.0000,    2.0000,     0}, new int[]{1, 4})
        );
        Pair<INDArray, INDArray> pair = matrixProfile.stamp(shortTargetSeries, window);
        assertEquals(expectedResultWhenSelfJoin.toString(), pair.toString());
    }

The first part of the pair in the result is always the same, but the second part of the pair is random. It is of the form [ 3.0000, x, x, 0]] where x is 0, 1, 2, or 3. Why? Is this a bug or my misunderstanding?

ensozos commented 5 years ago

Good catch! The problem is that we have to check (before stamp execution) if the window has size larger than half of query (in other case we have problem with indices). In that case we can not perform stamp execution. Also the size of window can not be smaller than 4. We have to make the same check for simple stamp

public Pair<INDArray, INDArray> stamp(INDArray target, int window) {
        DistanceProfile stamp = distanceProfileFactory.getDistanceProfile(DistanceProfileFactory.STAMP);
        int query_size = (int) target.shape()[1];

        if (window > query_size / 2)
            throw new IllegalArgumentException(); // TODO create custom Exception

        if (window < 4)
            throw new IllegalArgumentException();

        return matrixProfile(target, window, new RandomOrder((int) (target.length() - window + 1)), stamp, target, true);
    }

So in your example the method will throw an exception. The results are the same with Eamonn's matlab program. I will continue testing this solution

barrybecker4 commented 5 years ago

I'm not sure I fully understand. Why doesn't it work if window < 4? In my test, the window is = 4, so shouldn't it work? Why doesn't it work if window is greater than have the series length? It works for stmp why not stamp? In my test, window = 4 is much less than half the series length (which is 69). When you make the fix, please put a message string in the exception so that the client has some diagnostic info to help them debug. Something like

if (window > query_size / 2)
            throw new IllegalArgumentException("The window size " + window + " cannot be > have the size of the series (" + query_size +") because ...");