dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.89k forks source link

SlotNames behave differently based on column type #6087

Open luisquintanilla opened 2 years ago

luisquintanilla commented 2 years ago

System Information (please complete the following information):

Describe the bug

For multiclass classification problems, SlotNames are only available when the columns are of type string. Even though, the value and meaning of that value are the same, the SlotName behavior is different based on the data type.

To Reproduce Steps to reproduce the behavior:

  1. Train a multiclass classification model
  2. Map labels to scores. You can do it with code similar to the following:
using Microsoft.ML;
using Microsoft.ML.Data;
using myMLApp;

// Add input data
var sampleData = new SentimentModel.ModelInput()
{
    Col0 = "This restaurant was wonderful."
};

// Load model and predict output of sample data
var result = SentimentModel.Predict(sampleData);

// If PredictedLabel is 1, sentiment is "Positive"; otherwise, sentiment is "Negative"
string sentiment = result.PredictedLabel == "1" ? "Positive" : "Negative";
Console.WriteLine($"Text: {sampleData.Col0}\nSentiment: {sentiment}");

var sortedLabels = GetScoresWithLabelsSorted(SentimentModel.PredictEngine.Value.OutputSchema, nameof(result.Score), result.Score);

foreach(var (k,v) in sortedLabels)
{
    Console.WriteLine($"{k}: {v}");
}

static Dictionary<string, float> GetScoresWithLabelsSorted(DataViewSchema schema, string name, float[] scores)
{
    Dictionary<string, float> result = new Dictionary<string, float>();

    var column = schema.GetColumnOrNull(name);

    var slotNames = new VBuffer<ReadOnlyMemory<char>>();
    column.Value.GetSlotNames(ref slotNames);
    var names = new string[slotNames.Length];
    var num = 0;
    foreach (var denseValue in slotNames.DenseValues())
    {
        result.Add(denseValue.ToString(), scores[num++]);
    }

    return result.OrderByDescending(c => c.Value).ToDictionary(i => i.Key, i => i.Value);
}

Expected behavior

Not sure? I would think that if the value and "meaning" of that value are the same, the type shouldn't matter and SlotNames are made available.

luisquintanilla commented 2 years ago

Not sure if it was ever addressed, but additional sings that indicate SlotNames are intended to support types other than string https://github.com/dotnet/machinelearning/issues/2810#issuecomment-468813505

JakeRadMSFT commented 1 year ago

Has something to do with:

https://github.com/dotnet/machinelearning/blob/main/src/Microsoft.ML.Data/Scorers/MulticlassClassificationScorer.cs#L477

JakeRadMSFT commented 1 year ago

related to slotname issues in https://github.com/dotnet/machinelearning-modelbuilder/issues/2418