dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

Using PFI with AutoML, possible? #3972

Closed famschopman closed 4 years ago

famschopman commented 5 years ago

Playing with AutoML and so far having much fun with it.

I have a trained model and now trying to retrieve the feature weights. None of the objects returned expose a LastTransformer object that I need to

Code snippet:

var mlContext = new MLContext();
var _appPath = AppDomain.CurrentDomain.BaseDirectory;
 var _dataPath = Path.Combine(_appPath, "Datasets", "dataset.csv");
var _modelPath = Path.Combine(_appPath, "Datasets", "TrainedModels");

ColumnInferenceResults columnInference = mlContext.Auto().InferColumns(_dataPath, LabelColumnName, groupColumns: false);
            ColumnInformation columnInformation = columnInference.ColumnInformation;

            TextLoader textLoader = mlContext.Data.CreateTextLoader(columnInference.TextLoaderOptions);
            IDataView data = textLoader.Load(_dataPath);

            DataOperationsCatalog.TrainTestData dataSplit = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
            IDataView trainData = dataSplit.TrainSet;
            IDataView testData = dataSplit.TestSet;

            var cts = new CancellationTokenSource();
            var experimentSettings = CreateExperimentSettings(mlContext, cts);

            var progressHandler = new BinaryExperimentProgressHandler();

            ExperimentResult<BinaryClassificationMetrics> experimentResult = mlContext.Auto()
                .CreateBinaryClassificationExperiment(experimentSettings)
                .Execute(trainData, labelColumnName: "Attrition", progressHandler: new BinaryExperimentProgressHandler());

            RunDetail<BinaryClassificationMetrics> bestRun = experimentResult.BestRun;
            ITransformer trainedModel = bestRun.Model;
            var predictions = trainedModel.Transform(testData);
            var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(data: predictions, labelColumnName: "Attrition", scoreColumnName: "Score");

            mlContext.Model.Save(trainedModel, trainData.Schema, _modelPath);

Then I want to get the PFI information and I get stuck. There appears no way to get the LastTransformer object from the trainedModel.

            var transformedData = trainedModel.Transform(trainData);
            var linearPredictor = model.LastTransformer; 

            var permutationMetrics = mlContext.BinaryClassification.PermutationFeatureImportance(
                linearPredictor, transformedData, permutationCount: 30);

Hope someone can help me with some guidance.

jedsmallwood commented 5 years ago

I'm interested in a solution to this also. It seems like a good way to reduce the number of features if you can identify which features are important.

justinormont commented 5 years ago

@daholste: Do you think this simply needs to be cast into the right type which has .LastTransformer as a property?

Possibly related comic: https://blog.toggl.com/build-horse-programming/

daholste commented 5 years ago

First and foremost, I love that comic, @justinormont

+1, the C# segment of the comic feels apropos. If you inspect the model in the debugger GUI, you should be able to navigate to the last transformer. Thru casting C# objects as you see them in the debugger, you could write lines of C# code that correspond to the navigation in the GUI

Of course, this is terribly hacky. Off-hand, I'm not aware of an officially supported / less hacky way to do this. It could be a great area of focus for future development

jedsmallwood commented 5 years ago

The following cast lets me access the LastTransformer, however I cannot use it for PFI until I provide a better type for predictor. Debugging I can see it is of type Microsoft.ML.Data.RegressionPredictionTransformer<Microsoft.ML.IPredictorProducing> but I am unable to cast to that because Microsoft.ML.IPredictorProducing is not visible, so it seems like we're still stuck.

//setup code similar to famschopman 
RegressionExperiment experiment = mlContext.Auto().CreateRegressionExperiment(experimentSettings);

var experimentResults = experiment.Execute(split.TrainSet, split.TestSet);
var predictor = ((TransformerChain<ITransformer>)experimentResults.BestRun.Model).LastTransformer;

//this will not compile.
var permutationMetrics = mlContext.Regression.PermutationFeatureImportance(predictor, transformedData, permutationCount: 30);

The following compile error is produced.

The type arguments for method 'PermutationFeatureImportanceExtensions.PermutationFeatureImportance<TModel>(RegressionCatalog, ISingleFeaturePredictionTransformer<TModel>, IDataView, string, bool, int?, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly.    
eerhardt commented 5 years ago

See my analysis on https://github.com/dotnet/machinelearning/issues/3976 as well. These two issues feel like they are the same thing.

antoniovs1029 commented 4 years ago

The only thing that was needed to make this build and run was to add the (TransformerChain<ITransformer>) cast to the BestRun.Model (recommended in https://github.com/dotnet/machinelearning/issues/3972#issuecomment-521288508 , and then add another cast to (ISingleFeaturePredictionTransformer<object>) for the LinearPredictor, and that would have been enough to let you run PFI:

            RunDetail<BinaryClassificationMetrics> bestRun = experimentResult.BestRun;
            TransformerChain<ITransformer> trainedModel = (TransformerChain <ITransformer>) bestRun.Model;
            var predictions = trainedModel.Transform(testData);

            var linearPredictor = (ISingleFeaturePredictionTransformer<object>)trainedModel.LastTransformer;

            var permutationMetrics = mlContext.BinaryClassification.PermutationFeatureImportance(
                linearPredictor, predictions, permutationCount: 30);

PS: There was a bug (#4517) when running PFI particularly with Binary classification models, so even after getting this running, if AutoML had returned a non-calibrated binary model, then running PFI would have thrown an exception. This bug got fixed on #4587 , which got included in ML.NET 1.5.0-preview2 and 1.5.0, so that is fixed.

See my analysis on #3976 as well. These two issues feel like they are the same thing.

The problem described there got fixed on #4262 and #4292. Still, that problem wasn't really causing this problem, as the solution I mentioned above would have worked even then. The problem you refer to is not being able to cast a model loaded from disk to their actual type (e.g. BinaryPredictionTransformer<ParameterMixingCalibratedModelParameters<IPredictorProducing<float>, ICalibrator>> ). After that problem got fixed, users can now cast to the actual type, but they could always cast to (ISingleFeaturePredictionTransformer<object>) (which is more appropriate when using AutoML.NET since users won't know in advance the actual type of the model being returned by the experiment). So the point is that it was always possible to use PFI with AutoML by using the (ISingleFeaturePredictionTransformer<object>) cast I described above.