Q: Interpreting Feature PFI results

lefig commented 4 years ago

Hi all,

I have been doing a little deep dive into some of my models in order to understand a little more about feature relevance. My results for running feature explanatory analysis is as follows for bin classification:

2020-01-08 11:34:03.813 +00:00 [INF] BinaryFastTreeParameters 2020-01-08 11:34:03.815 +00:00 [INF] Bias: 0 2020-01-08 11:34:03.816 +00:00 [INF] Feature Weights: 2020-01-08 11:34:03.843 +00:00 [INF] Feature: CloseWeight: 0.1089412 2020-01-08 11:34:03.931 +00:00 [INF] Feature: OpenWeight: 0.3691619 2020-01-08 11:34:03.932 +00:00 [INF] Feature: HighWeight: 0.06676193 2020-01-08 11:34:03.933 +00:00 [INF] Feature: LowWeight: 0.1926264 2020-01-08 11:34:03.934 +00:00 [INF] Feature: STO_FastStochWeight: 0.19846 2020-01-08 11:34:03.938 +00:00 [INF] Feature: STO_StochKWeight: 0.5019926 2020-01-08 11:34:03.941 +00:00 [INF] Feature: STO_StochDWeight: 0.3781931 2020-01-08 11:34:03.942 +00:00 [INF] Feature: STOWeight: 0 2020-01-08 11:34:03.943 +00:00 [INF] Feature: CCI_TypicalPriceAvgWeight: 0.131141 2020-01-08 11:34:03.944 +00:00 [INF] Feature: CCI_TypicalPriceMADWeight: 0.1299266 2020-01-08 11:34:03.946 +00:00 [INF] Feature: CCIWeight: 1 2020-01-08 11:34:03.947 +00:00 [INF] Feature: RSIDownWeight: 0.4761779 2020-01-08 11:34:03.948 +00:00 [INF] Feature: RSIUpWeight: 0.1249975 2020-01-08 11:34:03.951 +00:00 [INF] Feature: RSIWeight: 0.2877662 2020-01-08 11:34:03.952 +00:00 [INF] Feature: MOMWeight: 0.1822069 2020-01-08 11:34:03.953 +00:00 [INF] Feature: ADX_PositiveDirectionalIndexWeight: 0.2435836 2020-01-08 11:34:03.954 +00:00 [INF] Feature: ADX_NegativeDirectionalIndexWeight: 0.4263106 2020-01-08 11:34:03.955 +00:00 [INF] Feature: ADXWeight: 0.1899773 2020-01-08 11:34:03.956 +00:00 [INF] Feature: CMOWeight: 0.2601428

But for PFI I have the following: 2020-01-08 11:34:09.369 +00:00 [INF] Calculating Binary Classification Feature PFI 2020-01-08 11:34:09.371 +00:00 [INF] Feature PFI for learner:BinaryFastTree 2020-01-08 11:34:09.383 +00:00 [INF] Close| 0.000000 2020-01-08 11:34:09.384 +00:00 [INF] Open| 0.000000 2020-01-08 11:34:09.385 +00:00 [INF] High| 0.000000 2020-01-08 11:34:09.386 +00:00 [INF] Low| 0.000000 2020-01-08 11:34:09.391 +00:00 [INF] STO_FastStoch| 0.000000 2020-01-08 11:34:09.400 +00:00 [INF] STO_StochK| 0.000000 2020-01-08 11:34:09.401 +00:00 [INF] STO_StochD| 0.000000 2020-01-08 11:34:09.402 +00:00 [INF] STO| 0.000000 2020-01-08 11:34:09.404 +00:00 [INF] CCI_TypicalPriceAvg| 0.000000 2020-01-08 11:34:09.406 +00:00 [INF] CCI_TypicalPriceMAD| 0.000113 2020-01-08 11:34:09.408 +00:00 [INF] CCI| 0.000000 2020-01-08 11:34:09.414 +00:00 [INF] RSIDown| 0.000221 2020-01-08 11:34:09.416 +00:00 [INF] RSIUp| 0.000000 2020-01-08 11:34:09.431 +00:00 [INF] RSI| 0.000000 2020-01-08 11:34:09.443 +00:00 [INF] MOM| -0.003003 2020-01-08 11:34:09.457 +00:00 [INF] ADX_PositiveDirectionalIndex| 0.000000 2020-01-08 11:34:09.467 +00:00 [INF] ADX_NegativeDirectionalIndex| 0.000000 2020-01-08 11:34:09.470 +00:00 [INF] ADX| 0.000000 2020-01-08 11:34:09.479 +00:00 [INF] CMO| 0.000000

My question is essentially - what should I read (if anything) into zero values for PFI. The evaluation score too: 020-01-08 11:34:17.135 +00:00 [INF] Score: -4.640871 2020-01-08 11:34:17.138 +00:00 [INF] Probability: 0.1351293

I would appreciate any thoughts that you may have regarding using such info to improve model veracity.

Thank you Fig

antoniovs1029 commented 4 years ago

Can you please share the code you used to print those values to check a couple of things?

lefig commented 4 years ago

Pleasure and thank you for your help!

The logging functions:

private void LogModelWeights(LinearBinaryModelParameters subModel, string name)
        {
            var weights = subModel.Weights.ToList();

            // Log the model parameters.
            Logger.Info(name + $"Parameters");
            Logger.Info("Bias: " + subModel.Bias);
            Logger.Info($"Feature Weights:");

            // 1 Feature Weights
            for (int i = 0; i < features.Length; i++)
            {
                contributions[i].Weight = weights[i];
                contributions[i].Contribution = 0;  // The weight will be assigned by the prediction engine
                                                    // Using CalculateFeatureContribution (bellow)
                Logger.Info(" Feature: " + contributions[i].Name + "Weight: " + contributions[i].Weight);
            }
        }

private void LogPermutationMetics(IDataView transformedData, 
            ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics)
        {
            var allFeatureNames = GetColumnNamesUsedForPFI(transformedData);
            var mapFields = new List<string>();
            for (int i = 0; i < allFeatureNames.Count(); i++)
            {
                var slotField = new VBuffer<ReadOnlyMemory<char>>();
                if (transformedData.Schema[allFeatureNames[i]].HasSlotNames())
                {
                    transformedData.Schema[allFeatureNames[i]].GetSlotNames(ref slotField);
                    for (int j = 0; j < slotField.Length; j++)
                    {
                        mapFields.Add(allFeatureNames[i]);
                    }
                }
                else
                {
                    mapFields.Add(allFeatureNames[i]);
                }
            }

            // Now let's look at which features are most important to the model
            // overall. Get the feature indices sorted by their impact on AUC.
            // The importance, or the absolute average decrease in R-squared metric calculated 
            // by PermutationFeatureImportance can then be ordered from most important to least important.
            var sortedIndices = permutationMetrics
                .Select((metrics, index) => new { index, metrics.AreaUnderRocCurve })
                .OrderByDescending(
                feature => Math.Abs(feature.AreaUnderRocCurve.Mean));

            Console.WriteLine($"Feature indices sorted by their impact on AUC:");

            foreach (var feature in sortedIndices)
            {
                Console.WriteLine($"{mapFields[feature.index],-20}|\t{Math.Abs(feature.AreaUnderRocCurve.Mean):F6}");
            }

            Console.WriteLine($"PMI AUC Logged as the following:");
            // Combine metrics with feature names and format for display
            for (int i = 0; i < permutationMetrics.Length; i++)
            {
                Logger.Info($"{importances[i].Name}|\t{permutationMetrics[i].AreaUnderRocCurve.Mean:F6}");
                importances[i].AUC = permutationMetrics[i].AreaUnderRocCurve.Mean;
            }
        }

najeeb-kazmi commented 4 years ago

Hi @lefig - can you share the code that generates the objects passed to these logging functions? LinearBinaryModelParameters subModel IDataView transformedData ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics

Please also share code for any data processing and model training.

PFI values for features being 0 mean that permuting the feature values did not change AreaUnderRocCurve much. This is not the same as the weight learned by the model being 0. You can have non-zero weights for a feature that are not statistically significant, and you could end up with a situation where PFI metrics are 0.

Note that PFI value is just one indicator of feature importance, not a conclusive statement of feature importance. That said, so many features having PFI of 0 warrants some further investigation. Here are a few reasons I can think of that can possibly explain this.

permutationCount used for calculating PFI is 1 (or a small number). Please double check the value of this argument is something reasonable (try something like 10 or 30)
The model itself might not be very good, so the change in AreaUnderRocCurve isn't very large when a feature is permuted. What is the actual AreaUnderRocCurve of this model evaluated on the training and test data? AreaUnderRocCurve ~0.5 or ~0.6 would indicate a particularly poor model, which you would expect to be about as poor when a feature is permuted, hence no change in AreaUnderRocCurve.
PFI indicates feature importance only on the data it is evaluated on. Are you evaluating ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics on a very small dataset? That could give rise to 0 change in AreaUnderRocCurve.

lefig commented 4 years ago

Hi @najeeb-kazmi

Thank you for your kind help. The code that generates the metrics is as follows (this is an example of one such learner that requires a calibrator).

private void CalculateGamCalibratedClassificationPermutationFeatureImportance(MLContext mlContext, IDataView transformedData,
                                                        ITransformer trainedModel, string learner)
        {
            // Extract the trainer (last transformer in the model)          
            var singleTrainerModel = (trainedModel as BinaryPredictionTransformer<CalibratedModelParametersBase<GamBinaryModelParameters,
                PlattCalibrator>>);

            //Calculate Feature Permutation
            ImmutableArray<BinaryClassificationMetricsStatistics> permutationMetrics =
                                            mlContext
                                                .BinaryClassification.PermutationFeatureImportance(predictionTransformer: singleTrainerModel,
                                                                                         data: transformedData,
                                                                                         labelColumnName: "Label",
                                                                                         numberOfExamplesToUse: 100, permutationCount: 50);
            Logger.Info("Calculating Binary Classification Feature PFI");
            Logger.Info("Feature PFI for learner:" + learner);
            LogPermutationMetics(transformedData, permutationMetrics);
        }

I tend to think (your point 2) that the model is poor and needs some features removed. Hence I was hoping to have some insight regarding the names of those features so that I can proceed with changing the model.

Best wishes Fig

najeeb-kazmi commented 4 years ago

@lefig

This is a Gam model, your original comment was for a linear model. For which model are you seeing 0 PFI metrics? For linear model, you can use L1 regularization to remove unimportant features (force their weights to be 0). For Gam models, see this example for how to understand feature importance. Basically, features where the bin effects are relatively flat are less important ( and could be removed), while feature whose bin effects show some trend are more important. Try PFI after these features have been removed.
What is the AUC of this model?
Maybe using only 100 rows is the reason you are not seeing non-zero PFI. Try using the entire dataset.

najeeb-kazmi commented 4 years ago

@lefig any update on this and the information I requested? Also, did any of my suggestions help in debugging this?

I'm curious to see why this is happening as it is quite unusual. As I mentioned, it's not clear which model is giving you 0 PFI, Gam or linear. Would be nice to see reproducible example so I can debug this (small snippet of the data and the actual code for training the model and calculating PFI).

lefig commented 4 years ago

Hi @najeeb-kazmi

I really appreciate your time and help with this. Please let me generate some further test data and I will get back to you.

najeeb-kazmi commented 4 years ago

@lefig if this is still an issue, please feel free to reopen.

dotnet / machinelearning

Q: Interpreting Feature PFI results #4637