dotnet / machinelearning-samples

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
4.49k stars 2.68k forks source link

Problem with DetectIidSpike #816

Open mfaghfoory opened 4 years ago

mfaghfoory commented 4 years ago

I want to create a spike detector as you have demonstrated in its sample page. Here it is my code

class Program
    {
        private static MLContext mlContext;
        static void Main(string[] args)
        {
            mlContext = new MLContext();

            //assign the Number of records in dataset file to cosntant variable
            const int size = 36;

            //Load the data into IDataView.
            //This dataset is used while prediction/detecting spikes or changes.
            IDataView dataView = mlContext.Data.LoadFromTextFile<ProductSalesData>(path: "product-sales.csv", hasHeader: true, separatorChar: ',');

            //To detech temporay changes in the pattern
            DetectSpike(size, dataView);

            Console.WriteLine("=============== End of process, hit any key to finish ===============");

            Console.ReadLine();
        }

        static void DetectSpike(int size, IDataView dataView)
        {
            Console.WriteLine("===============Detect temporary changes in pattern===============");

            //STEP 1: Create Esimtator   
            var estimator = mlContext.Transforms.DetectIidSpike(outputColumnName: nameof(ProductSalesPrediction.Prediction), inputColumnName: nameof(ProductSalesData.numSales), confidence: 95, pvalueHistoryLength: size / 4);

            //STEP 2:The Transformed Model.
            //In IID Spike detection, we don't need to do training, we just need to do transformation. 
            //As you are not training the model, there is no need to load IDataView with real data, you just need schema of data.
            //So create empty data view and pass to Fit() method. 
            ITransformer tansformedModel = estimator.Fit(CreateEmptyDataView());

            //STEP 3: Use/test model
            //Apply data transformation to create predictions.
            IDataView transformedData = tansformedModel.Transform(dataView);
            var predictions = mlContext.Data.CreateEnumerable<ProductSalesPrediction>(transformedData, reuseRowObject: false);

            Console.WriteLine("Alert\tScore\tP-Value");
            foreach (var p in predictions)
            {
                if (p.Prediction[0] == 1)
                {
                    Console.BackgroundColor = ConsoleColor.DarkYellow;
                    Console.ForegroundColor = ConsoleColor.Black;
                }
                Console.WriteLine("{0}\t{1:0.00}\t{2:0.00}", p.Prediction[0], p.Prediction[1], p.Prediction[2]);
                Console.ResetColor();
            }
            Console.WriteLine("");
        }

        private static IDataView CreateEmptyDataView()
        {
            //Create empty DataView. We just need the schema to call fit()
            IEnumerable<ProductSalesData> enumerableData = new List<ProductSalesData>();
            var dv = mlContext.Data.LoadFromEnumerable(enumerableData);
            return dv;
        }
    }

    public class ProductSalesData
    {
        [LoadColumn(0)]
        public string Month;

        [LoadColumn(1)]
        public float numSales;
    }

    public class ProductSalesPrediction
    {
        //vector to hold alert,score,p-value values
        [VectorType(3)]
        public double[] Prediction { get; set; }
    }

I have attached the dataset and at line 6, the numSales value is "703.5" which is absolutely a spike but is not detected as a spike. This is my result:

===============Detect temporary changes in pattern===============
Alert   Score   P-Value
0       271.00  0.50
0       150.90  0.00
0       188.10  0.41
0       124.30  0.13
0       185.30  0.47
0       703.50  0.00      --> This line is not detected as a spike
0       236.80  0.50
0       229.50  0.49
0       197.80  0.44
0       127.90  0.30
0       341.50  0.27
0       190.90  0.42
0       199.30  0.44
0       154.50  0.33
0       215.10  0.46
0       278.30  0.19
0       196.40  0.43
0       292.00  0.17
0       231.00  0.45
0       308.60  0.18
0       294.90  0.19
**1       426.60  0.00**
0       269.50  0.47
0       347.30  0.21
0       344.70  0.27
0       445.40  0.06
0       320.90  0.49
0       444.30  0.12
0       406.30  0.29
0       442.40  0.21
**1       580.50  0.00**
0       412.60  0.45
**1       687.00  0.01**
0       480.30  0.40
0       586.30  0.20
0       651.90  0.14

=============== End of process, hit any key to finish ===============
Kinco-dev commented 3 years ago

Hello,

I think it beacause you have set the parameter "pvalueHistoryLength" to 9 (36/4).

The algorithm therefore needs 9 months of data in order to be able to confirm that there is indeed an anomaly even if the p-value is 0. I performed my tests by placing the anomaly on the 7th month and the 8th month, and nothing was detected. It was only from the 9th month and more that the algorithm detected the anomaly.

Example for the anomaly on the 9th month :

===============Detect temporary changes in pattern===============
Alert   Score   P-Value
0       100,00  0,50
0       120,00  0,00    =>  This line is not detected as a spike (because before the 9th month)
0       130,00  0,08
0       128,00  0,28
0       135,00  0,18
0       156,00  0,01
0       140,00  0,31
0       150,00  0,19
1       850,00  0,00    =>  This line is detected as a spike
0       170,00  0,50
0       154,00  0,54
0       162,00  0,53
0       147,00  0,56
0       154,50  0,55
0       215,10  0,46
0       278,30  0,37
0       196,40  0,53
0       292,00  0,39
0       231,00  0,30
0       308,60  0,09
0       294,90  0,19
1       426,60  0,01
0       269,50  0,47
0       347,30  0,21
0       344,70  0,27
0       445,40  0,06
0       320,90  0,51
0       444,30  0,12
0       406,30  0,29
0       442,40  0,21
1       580,50  0,00
0       412,60  0,45
1       687,00  0,01
0       480,30  0,40
0       586,30  0,20
0       651,90  0,14
=============== End of process, hit any key to finish ===============
mfaghfoory commented 3 years ago

Good point, I will check that