dotnet / docs

This repository contains .NET Documentation.
https://learn.microsoft.com/dotnet
Creative Commons Attribution 4.0 International
4.22k stars 5.87k forks source link

Getting Scores and Labels for prediction in multiclass classification with ML .NET #14265

Open dliedke opened 5 years ago

dliedke commented 5 years ago

Hello! I would recommend this tutorial to also show the labels and scores when predicting, example:

Area: area-System.Data Score: 46.87811% Area: area-Infrastructure Score: 27.88144% Area: area-System.Security Score: 6.85226% Area: area-Meta Score: 4.304262% Area: area-Serialization Score: 3.70437% Area: area-System.Net Score: 1.533578% Area: area-System.ComponentModel Score: 1.258625% Area: area-System.Reflection Score: 1.07564% Area: area-System.Runtime Score: 0.8889261% Area: area-System.Drawing Score: 0.8468383% Area: area-Microsoft.CSharp Score: 0.8202795% Area: area-System.Collections Score: 0.7278437% Area: area-System.Xml Score: 0.6647807% Area: area-System.Diagnostics Score: 0.4810925% Area: area-System.Globalization Score: 0.3920802% Area: area-System.Memory Score: 0.3072678% Area: area-System.Threading Score: 0.2958634% Area: area-System.Linq Score: 0.2904106% Area: area-System.Console Score: 0.2705316% Area: area-System.Text Score: 0.2450379% Area: area-System.IO Score: 0.220165% Area: area-System.Numerics Score: 0.06059819%

I use the following code:

using Microsoft.ML.Data;
using System.Collections.Generic;

public class IssuePrediction
{
    [ColumnName("PredictedLabel")]
    public string Area;

    [ColumnName("Score")]
    public float[] Score { get; set; }
}

        public static void PredictIssue()
        {
            // <SnippetLoadModel>
            ITransformer loadedModel = _mlContext.Model.Load(_modelPath, out var modelInputSchema);            
            // </SnippetLoadModel>

            // <SnippetAddTestIssue> 
            GitHubIssue singleIssue = new GitHubIssue() { Title = "Entity Framework crashes", Description = "When connecting to the database, EF is crashing" };
            // </SnippetAddTestIssue> 

            //Predict label for single hard-coded issue
            // <SnippetCreatePredictionEngine>
            _predEngine = _mlContext.Model.CreatePredictionEngine<GitHubIssue, IssuePrediction>(loadedModel);
            // </SnippetCreatePredictionEngine>

            // <SnippetPredictIssue>
            var prediction = _predEngine.Predict(singleIssue);
            // </SnippetPredictIssue>

            // <SnippetDisplayResults>
            Console.WriteLine($"=============== Single Prediction - Result: {prediction.Area} ===============");
            // </SnippetDisplayResults>

            // Show score of each area ordered by score desc
            var scoreEntries = GetScoresWithLabelsSorted(_predEngine.OutputSchema, "Score", prediction.Score);
            foreach (var scoreEntry in scoreEntries)
            {
                Console.WriteLine($"Area: {scoreEntry.Key} Score: {scoreEntry.Value * 100}%");
            }
        }

        private static Dictionary<string, float> GetScoresWithLabelsSorted(DataViewSchema schema, string name, float[] scores)
        {
            Dictionary<string, float> result = new Dictionary<string, float>();

            var column = schema.GetColumnOrNull(name);

            var slotNames = new VBuffer<ReadOnlyMemory<char>>();
            column.Value.GetSlotNames(ref slotNames);
            var names = new string[slotNames.Length];
            var num = 0;
            foreach (var denseValue in slotNames.DenseValues())
            {
                result.Add(denseValue.ToString(), scores[num++]);
            }

            return result.OrderByDescending(c => c.Value).ToDictionary(i => i.Key, i => i.Value);
        }

This should help a lot of people!

Thank you!

Daniel Liedke


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

luisquintanilla commented 5 years ago

Thanks for the suggestion @dliedke . While this information is helpful for someone looking to go deeper into understanding the predictions, it adds some complexity to the tutorial that goes beyond getting started.

rossnoe commented 4 years ago

@dliedke Your labels and scores addition was very helpful. Saved me the trouble of having to write it myself. Thank you.

dliedke commented 4 years ago

@dliedke Your labels and scores addition was very helpful. Saved me the trouble of having to write it myself. Thank you.

Great to hear and happy to help!! Daniel

rossnoe commented 4 years ago

@dliedke , I have a question. I am using the PredictionEnginePool, as recommended here . When I use PredictionEnginePool I am not able to get the OutputSchema as you did in your code (_predEngine.OutputSchema). I tried _predictionEnginePool.OutputSchema and it does not work. What do you recommend?

rossnoe commented 4 years ago

Actually, I realized that I can simply do something like _mlContext.Model.Save(loadedModel, _predEngine.OutputSchema, _schemaPath); and save the schema file to zip file (similar to model). Then I have it available for Web API or other project.

wmundstock commented 4 years ago

@dliedke I spent quite some time trying to do this myself and luckily I found your code! Thanks for sharing!

dliedke commented 4 years ago

Great! thanks Walter! have fun!

JeanCollas commented 3 years ago

@luisquintanilla I do not agree with you... that is what I spent most time looking for in the documentation... This little function of @dliedke should be in the documentation (or in the objects, or linkable to the output objects)

static Dictionary<string, float> GetScoresWithLabelsSorted(DataViewSchema schema, string name, float[] scores)
{
    Dictionary<string, float> result = new Dictionary<string, float>();

    var column = schema.GetColumnOrNull(name);

    var slotNames = new VBuffer<ReadOnlyMemory<char>>();
    column.Value.GetSlotNames(ref slotNames);
    var names = new string[slotNames.Length];
    var num = 0;
    foreach (var denseValue in slotNames.DenseValues())
    {
        result.Add(denseValue.ToString(), scores[num++]);
    }

    return result.OrderByDescending(c => c.Value).ToDictionary(i => i.Key, i => i.Value);
}
pawod commented 3 years ago

I also agree with @JeanCollas and @dliedke it took me hours to figure out how this works. The docs do not mention how to map the results of a multi class estimator to the corresponding labels. Very frustrating.

luisquintanilla commented 3 years ago

Thanks for all your feedback @pawod @JeanCollas @dliedke we're in the process of updating our documentation and will prioritize this change.