dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

Auc is NaN when loading data from IEnumerable #3175

Closed prathyusha12345 closed 5 years ago

prathyusha12345 commented 5 years ago

I am getting below error when I am evaluating the model.

System.ArgumentOutOfRangeException: 'AUC is not definied when there is no negative class in the data Parameter name: NegSample'

Source code / logs

The values of label column are true/false. I applied transformation on label using MapValuetoKey to convert true to 1 and false to 0. But I still get the error while evaluating.

See the below code.

using System;
using System.IO;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.Data.DataView;
using System.Collections.Generic;

namespace MLNETConsoleApp3
{
    class Program
    {
        static void Main()
        {
            // 1. Implement the pipeline for creating and training the model    
            var mlContext = new MLContext();
            var trainingData = GetTrainingData();
            var TestData = GetTestData();

            // 2. Specify how training data is going to be loaded into the DataView
            IDataView trainingDataView = mlContext.Data.LoadFromEnumerable(trainingData);

            // 2. Create a pipeline to prepare your data, pick your features and apply a machine learning algorithm.
            // 2a. Featurize the text into a numeric vector that can be used by the machine learning algorithm.
            var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "keyName", inputColumnName: DefaultColumnNames.Label).
                Append(mlContext.Transforms.Text.FeaturizeText(outputColumnName: DefaultColumnNames.Features, inputColumnName: nameof(SentimentData.Text)))
                    .Append(mlContext.BinaryClassification.Trainers.StochasticDualCoordinateAscent(labelColumnName: "keyName",
                                                                                                   featureColumnName: DefaultColumnNames.Features))
                    .Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName: DefaultColumnNames.Label, inputColumnName: "keyName"));

             var transformedData_default = pipeline.Fit(trainingDataView).Transform(trainingDataView);
            var preViewTransformedData = transformedData_default.Preview(maxRows: 4);

            foreach (var row in preViewTransformedData.RowView)
            {
                var ColumnCollection = row.Values;
                string lineToPrint = "Row--> ";
                foreach (KeyValuePair<string, object> column in ColumnCollection)
                {
                    lineToPrint += $"| {column.Key}:{column.Value}";
                }
                Console.WriteLine(lineToPrint + "\n");
            }

            // 3. Get a model by training the pipeline that was built.
            Console.WriteLine("Creating and Training a model for Sentiment Analysis using ML.NET");
            ITransformer model = pipeline.Fit(trainingDataView);

            // 4. Evaluate the model to see how well it performs on different dataset (test data).
            Console.WriteLine("Training of model is complete \nEvaluating the model with test data");

            IDataView testDataView = mlContext.Data.LoadFromEnumerable(TestData);
            var predictions = model.Transform(testDataView);
            var results = mlContext.BinaryClassification.Evaluate(predictions);
            Console.WriteLine($"Accuracy: {results.Accuracy:P2}");

            // 5. Use the model for making a single prediction.
            var predictionEngine = model.CreatePredictionEngine<SentimentData, SentimentPrediction>(mlContext);
            var testInput = new SentimentData { Text = "ML.NET is fun, more samples at https://github.com/dotnet/machinelearning-samples" };
            SentimentPrediction resultprediction = predictionEngine.Predict(testInput);

            /* This template uses a minimal dataset to build a sentiment analysis model which leads to relatively low accuracy. 
             * Building good Machine Learning models require large volumes of data. This template comes with a minimal dataset (Data/wikipedia-detox) for sentiment analysis. 
             * In order to build a sentiment analysis model with higher accuracy please follow the walkthrough at https://aka.ms/mlnetsentimentanalysis/. */
            Console.WriteLine($"Predicted sentiment for \"{testInput.Text}\" is: { (Convert.ToBoolean(resultprediction.Prediction) ? "Positive" : "Negative")}");

            // 6. Save the model to file so it can be used in another app.
            Console.WriteLine("Saving the model");

            using (var fs = new FileStream("sentiment_model.zip", FileMode.Create, FileAccess.Write, FileShare.Write))
            {
                model.SaveTo(mlContext, fs);
                fs.Close();
            }

            Console.ReadLine();
        }
wschin commented 5 years ago

Does your test data contain negative labels? It's possible that your test data is too small so that AUC is not a well-defined metric (to compute AUC, we need at least one positive and one negative labels).

prathyusha12345 commented 5 years ago

@wschin The data classes are as below

        public class SentimentData
        {
            public bool Label { get; set; }
            public string Text { get; set; }

            //// Additional property for testing purpose
            //public string Expected { get; set; }
        }

        public class SentimentPrediction
        {
            [ColumnName("PredictedLabel")]
            public bool Prediction { get; set; }
            public float Probability { get; set; }
            public float Score { get; set; }
        }

The training and test data is as below


        public static List<SentimentData> GetTrainingData()
        {
            return new List<SentimentData>
            {
                new SentimentData
                {
                    Label = true,
                    Text = "Good service."
                },
                new SentimentData
                {
                    Label = true,
                    Text = "Very good service"
                },
                new SentimentData
                {
                    Label = true,
                    Text = "Amazing service"
                },
                new SentimentData
                {
                    Label = true,
                    Text = "Great staff, will visit again. thanks for the gift"
                },
                new SentimentData
                {
                    Label = false,
                    Text = "Bad staff, bad service. Will never visit this hotel"
                },
                new SentimentData
                {
                    Label = false,
                    Text = "The service was very bad"
                },
                new SentimentData
                {
                    Label = false,
                    Text = "Hotel location is worst"
                }
            };
        }

        public static List<SentimentData> GetTestData()
        {
            return new List<SentimentData>
            {
                new SentimentData
                {
                    Label = true,
                    Text = "Worst hotel in New York"
                    //Expected = "Negative"
                },
                new SentimentData
                {
                    Label = true,
                    Text = "I ordered pizza and recieved Wine. Bad staff"
                    //,
                    //Expected = "Negative"
                },
                new SentimentData
                {
                    Label = true,
                    Text = "The hotel was so amazing, and they givena bag to me on gift"
                    //,
                    //Expected = "Positive"
                },
                new SentimentData
                {
                    Label = true,
                    Text = "The hotel staff was great, will visit again"
                    //,
                    //Expected = "Positive"
                }
            };
        }
Ivanidzo4ka commented 5 years ago

Don't apply MapValueToKey for label column. Binary classification works on top of boolean labels. In 0.12 (which should be out today, or tomorrow) boolean is only acceptable type for binary classification label.

prathyusha12345 commented 5 years ago

@Ivanidzo4ka Ok. Actually there was an issue created in our samples repo here . while working on that I tried MapKeyToValue transformation to convert label values true,false to numeric values. If I remove that MapKeyValue() mapping and upgrade to v0.12 , does this sample works?

Ivanidzo4ka commented 5 years ago

No, because your test data contains only positive examples and it's pointless to calculate metrics on data with only one class presented.

wschin commented 5 years ago

Try this test data and then you should get AUC.

        public static List<SentimentData> GetTestData()
        {
            return new List<SentimentData>
            {
                new SentimentData
                {
                    Label = true,
                    Text = "Worst hotel in New York"
                    //Expected = "Negative"
                },
                new SentimentData
                {
                    Label = true,
                    Text = "I ordered pizza and recieved Wine. Bad staff"
                    //,
                    //Expected = "Negative"
                },
                new SentimentData
                {
                    Label = true,
                    Text = "The hotel was so amazing, and they givena bag to me on gift"
                    //,
                    //Expected = "Positive"
                },
                new SentimentData
                {
                    Label = false,
                    Text = "Sadly, a negative data point."
                    //,
                    //Expected = "Negative"
                }
            };
        }
prathyusha12345 commented 5 years ago

OK. Thanks for correcting...I am closing the issue.