dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.47k stars 2.68k forks source link

ML.Net Anomaly detection does not seem to be flagging trend changes #921

Open olivier-tritschler opened 3 years ago

olivier-tritschler commented 3 years ago

Hi, I'm trying to design a model to identify when there's a trend change in data. I already implemented a spike detector and it's working fine. What I want to do is a model that will detect when values that are fairly stable start rising slowly (ramp up), when the values jump (mean shift), or when there's a trend change. image

Note: in my data sets, the lines are not entirely straight of course, there is some variation around the mean values, but the general shape is what I'm describing above.

I have so far tried 3 approaches: DetectChangePointBySsa, DetectIidChangePoint, and DetectEntireAnomalyBySrCnn.

  1. None of these seem to identify changes reliably. For some data sets, I get the change point for mean shifts (although I haven't quite figured out what parameters I can play with to make it more consistent), but for the ramp up or trend inversions, I'm not seeing change points. What could cause this? (code pasted below)
  2. For data sets that don't have seasonality, is it still possible to use SSA or SrCnn?

` GetChangePointPredictionsWithSsa(MLContext mlContext, string path, int historyLength, double confidence, int seasonality)

        int trainingSeasons = 10;
        int trainingSize = seasonality * trainingSeasons;
        var iidTrendEstimator = mlContext.Transforms.DetectChangePointBySsa(
            outputColumnName: nameof(AnomalyPrediction.Prediction),
            inputColumnName: nameof(TimeSeriesData.Quantity),
            confidence: confidence,
            changeHistoryLength: historyLength,
            trainingWindowSize: trainingSize,
            seasonalityWindowSize: seasonality + 1);

        var dataView = mlContext.Data.LoadFromTextFile<TimeSeriesData>(path: path, hasHeader: true, separatorChar: ',');
        ITransformer ssaChangePointTransform = iidTrendEstimator.Fit(dataView);
        IDataView changePointData = ssaChangePointTransform.Transform(dataView);
        var changePointPredictions = mlContext.Data.CreateEnumerable<AnomalyPrediction>(changePointData, reuseRowObject: false);
        return changePointPredictions;

`

`

GetChangePointPredictionsWithIid(MLContext mlContext, string path, int historyLength, double confidence)

        var iidTrendEstimator = mlContext.Transforms.DetectIidChangePoint(
        outputColumnName: nameof(AnomalyPrediction.Prediction),
        inputColumnName: nameof(TimeSeriesData.Quantity),
        confidence: confidence,
        changeHistoryLength: historyLength);
        ITransformer iidChangePointTransform = iidTrendEstimator.Fit(CreateEmptyDataView(mlContext));

        var dataView = mlContext.Data.LoadFromTextFile<TimeSeriesData>(path: path, hasHeader: true, separatorChar: ',');
        IDataView changePointData = iidChangePointTransform.Transform(dataView);

        var changePointPredictions = mlContext.Data.CreateEnumerable<AnomalyPrediction>(changePointData, reuseRowObject: false);
        return changePointPredictions;

`

` GetAnomalyPredictions(MLContext mlContext, string path, int historyLength, double confidence)

        var dataView = mlContext.Data.LoadFromTextFile<TimeSeriesDataDouble>(path: path, hasHeader: true, separatorChar: ',');
        var outputDataView = mlContext.AnomalyDetection.DetectEntireAnomalyBySrCnn(
            dataView, 
            outputColumnName: nameof(AnomalyPrediction.Prediction),
            inputColumnName: nameof(TimeSeriesDataDouble.Quantity),
            threshold: 0.3, 
            batchSize: 128, 
            sensitivity: 70.0, 
            detectMode: SrCnnDetectMode.AnomalyOnly);

        var predictions = mlContext.Data.CreateEnumerable<AnomalyPrediction>(
            outputDataView, reuseRowObject: false);

        return predictions;

`

I have looked at all the samples I could find, but I haven't found pointers on what to do when data is not cyclical (seasonal) or if no changes are detected. What are the parameters I can play with to get better results?

Thanks in advance Olivier

olivier-tritschler commented 3 years ago

To show an example of what I mean, I plotted the change points found using all three approaches for a ramp-up scenario, and only one anomaly is flagged (out of all 3) and it's not where I'd expect it:

image