dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

System.ArgumentOutOfRangeException: 'Score column 'Score' not found (Parameter 'schema')' #6718

Closed tcaivano closed 1 year ago

tcaivano commented 1 year ago

System Information (please complete the following information):

Describe the bug When calling mlContext.BinaryClassification.Evaluate() without the Score parameter, I get an exception: System.ArgumentOutOfRangeException: 'Score column 'Score' not found (Parameter 'schema')'

To Reproduce Steps to reproduce the behavior:

public class AuthEventTransform
    {
        [LoadColumn(0)]
        public int TimeStamp { get; set; }
        [LoadColumn(1)]
        public string? SourceUser { get; set; }
        [LoadColumn(2)]
        public string? DestinationUser { get; set; }
        [LoadColumn(3)]
        public string? SourceComputer { get; set; }
        [LoadColumn(4)]
        public string? DestinationComputer { get; set; }
        [LoadColumn(5)]
        public string? LogonType { get; set; }
        [LoadColumn(6)]
        public bool AuthenticationOrientation { get; set; }
        [LoadColumn(7)]
        public bool IsSuccessful { get; set; }
        [LoadColumn(8)]
        public bool IsRedTeam { get; set; }
}
            MLContext mlContext = new MLContext();
            IDataView trainingdata = mlContext.Data.LoadFromTextFile<AuthEventTransform>(truthFileLocation, hasHeader: false, separatorChar: ',');
            IDataView testDataView = mlContext.Data.LoadFromTextFile<AuthEventTransform>(testFileLocation, hasHeader: false, separatorChar: ';');
            var pipeline = mlContext.Transforms.Categorical.OneHotEncoding(new[] {
                new InputOutputColumnPair(nameof(AuthEventTransform.SourceUser), nameof(AuthEventTransform.SourceUser)),
                new InputOutputColumnPair(nameof(AuthEventTransform.DestinationUser), nameof(AuthEventTransform.DestinationUser)),
                new InputOutputColumnPair(nameof(AuthEventTransform.SourceComputer), nameof(AuthEventTransform.SourceComputer)),
                new InputOutputColumnPair(nameof(AuthEventTransform.DestinationComputer), nameof(AuthEventTransform.DestinationComputer)),
                new InputOutputColumnPair(nameof(AuthEventTransform.LogonType), nameof(AuthEventTransform.LogonType)),
                new InputOutputColumnPair(nameof(AuthEventTransform.AuthenticationOrientation), nameof(AuthEventTransform.AuthenticationOrientation)),
                new InputOutputColumnPair(nameof(AuthEventTransform.IsSuccessful), nameof(AuthEventTransform.IsSuccessful))
            });
            pipeline.Append(mlContext.Transforms.Concatenate("Features",
                nameof(AuthEventTransform.SourceUser),
                nameof(AuthEventTransform.DestinationUser),
                nameof(AuthEventTransform.SourceComputer),
                nameof(AuthEventTransform.DestinationComputer),
                nameof(AuthEventTransform.LogonType),
                nameof(AuthEventTransform.AuthenticationOrientation),
                nameof(AuthEventTransform.IsSuccessful)));
            pipeline.Append(mlContext.BinaryClassification.Trainers.FastTree(new FastTreeBinaryTrainer.Options() { NumberOfLeaves = 4, MinimumExampleCountPerLeaf = 20, NumberOfTrees = 4, MaximumBinCountPerFeature = 254, FeatureFraction = 1, LearningRate = 0.1, LabelColumnName = @"IsRedTeam", FeatureColumnName = @"Features" }));
            pipeline.AppendCacheCheckpoint(mlContext);

            ITransformer trainedModel = pipeline.Fit(trainingdata);

            var predictions = trainedModel.Transform(testDataView);
            var metrics = mlContext.BinaryClassification.Evaluate(predictions, "IsRedTeam");

Expected behavior Score is not required

tcaivano commented 1 year ago

I can confirm that my schema appears to be incomplete: image

tcaivano commented 1 year ago

And I'm still getting this issue when I explicitly redefine the pipeline:

var pipeline = mlContext.Transforms.CopyColumns(outputColumnName: "Label", inputColumnName: "IsRedTeam");
            pipeline.Append(mlContext.Transforms.Categorical.OneHotEncoding("SourceUserEncoded", "SourceUser"))
            .Append(mlContext.Transforms.Categorical.OneHotEncoding("DestinationUserEncoded", "DestinationUser"))
            .Append(mlContext.Transforms.Categorical.OneHotEncoding("SourceComputerEncoded", "SourceComputer"))
            .Append(mlContext.Transforms.Categorical.OneHotEncoding("DestinationComputerEncoded", "DestinationComputer"))
            .Append(mlContext.Transforms.Categorical.OneHotEncoding("LogonTypeEncoded", "LogonType"))
            .Append(mlContext.Transforms.Categorical.OneHotEncoding("AuthenticationOrientationEncoded", "AuthenticationOrientation"))
            .Append(mlContext.Transforms.Categorical.OneHotEncoding("IsSuccessfulEncoded", "IsSuccessful"))
            .Append(mlContext.Transforms.Concatenate("Features", "SourceUserEncoded", "DestinationUserEncoded", "SourceComputerEncoded", "DestinationComputerEncoded", "LogonTypeEncoded", "AuthenticationOrientationEncoded", "IsSuccessfulEncoded"));

            pipeline.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumnName: @"IsRedTeam", featureColumnName: @"Features"));

            var trainedModel = pipeline.Fit(trainingdata);
            var predictions = trainedModel.Transform(testDataView);
            var schema = predictions.Schema;

            var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(predictions, "IsRedTeam");
feiyun0112 commented 1 year ago

because pipeline.Append will create new Estimator, you need to assign return value against to pipeline

IEstimator<ITransformer>>pipeline = ...

pipeline =pipeline.Append(...);