dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.94k stars 1.86k forks source link

'Label' not found #6730

Open thoron opened 1 year ago

thoron commented 1 year ago

System Information (please complete the following information):

Describe the bug

An error for missing Label in Schema when trying to load text (csv) without header row.

Row example:

5;0.968795895576477;0.8838793039321899;1.0125187635421753;1.0380022525787354;1.003713607788086;0.8773788213729858;0.7508044838905334;0.7412265539169312;0.7468504905700684;0.7589845061302185;0.7755808234214783;0.7674760818481445;0.6741359829902649;0.6582905054092407;0.6679562926292419;0.6805443167686462;0.6613132357597351;0.48050951957702637;0.5232967138290405;0.5599182844161987;0.522437334060669;0.5111487507820129;0.5027106404304504

To Reproduce

var ctx = new MLContext(1);
var opts = new TextLoader.Options
{
    HasHeader = false,
    Columns = new[]
    {
        new TextLoader.Column("Label", DataKind.UInt32, 0),
        new TextLoader.Column("Features", DataKind.Single, 1, 29)
    },
    Separators = new[] {';'},
};
var loader = ctx.Data.CreateTextLoader(opts);
var data = loader.Load(@"C:\test.csv");
var trainValidationData = ctx.Data.TrainTestSplit(data, testFraction: 0.2);
var pipeline = ctx.Auto()
    .Featurizer(data)
    .Append(ctx.Transforms.Conversion.MapValueToKey("Label"))
    .Append(ctx.Auto().MultiClassification());
var xx = ctx.Auto()
    .CreateExperiment()
    .SetPipeline(pipeline)
    .SetMulticlassClassificationMetric(MulticlassClassificationMetric.MacroAccuracy)
    .SetTrainingTimeInSeconds(60)
    .SetDataset(trainValidationData)
    .Run();

Removing Featurizer does not produce any different result, same error.

var pipeline = ctx.Transforms.Conversion.MapValueToKey("Label")
            .Append(ctx.Auto().MultiClassification());

Generates error:

System.AggregateException : One or more errors occurred. (label column 'Label' not found (Parameter 'schema'))
  ----> System.ArgumentOutOfRangeException : label column 'Label' not found (Parameter 'schema')
Data:
  ML_IsMarked: 1
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at Microsoft.ML.AutoML.AutoMLExperiment.Run()

Loaded data looks as expected: image

Expected behavior

Loading schema for AutoML when Label has been specified.

Might be due to missing header row and/or not using InferColumns. Schema looks fine on runtime manual inspection, am I missing something?

fwaris commented 3 months ago

I am running into a similar problem. In my case, the experiment uses Binary Classification.

It seems that whatever dataview the evaluator sees, does not have the Label column.

System.ArgumentOutOfRangeException: label column 'Label' not found (Parameter 'schema')
   at Microsoft.ML.Data.RoleMappedSchema.MapFromNames(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.RoleMappedSchema..ctor(DataViewSchema schema, IEnumerable`1 roles, Boolean opt)
   at Microsoft.ML.Data.RoleMappedData..ctor(IDataView data, Boolean opt, KeyValuePair`2[] roles)
   at Microsoft.ML.Data.BinaryClassifierEvaluator.Evaluate(IDataView data, String label, String score, String predictedLabel)
fwaris commented 3 months ago

I dug deeper into AutoML code and found that label column for the evaluator is always 'label' (lower case).

image

I renamed "Label" to "label" everywhere and that fixed this issue