[spike] Investigate Scikit-Learn's decision_function and score_samples

Is your enhancement request related to a problem? Please describe

Scikit-Learn's pipeline api provides two additional methods that are not covered by Rubicon's Scikit-Learn integration: decision_function and score_samples.

Further investigate decision_function and score_samples to determine if these should be integrated in to Rubicon:

[x] Use case for decision_function rather than score?
[x] Does score use decision_function?
[x] When should score_samples be leveraged?
[x] Additionally, score_samples might be the solution to allow multiple_scores in Rubicon

Additional Sources decision_function in Scikit-learn examples:

score_samples in Scikit-learn examples:

Scikit-learn defines decision_function as:

In a fitted classifier or outlier detector, predicts a “soft” score for each sample in relation to each class, rather than the “hard” categorical prediction produced by predict. Its input is usually only some observed data, X.

If the estimator was not already fitted, calling this method should raise a exceptions.NotFittedError.

Scikit-learn's decision_function is used to predict soft scores for samples. Predictions are out of scope for Rubicon, there is no need to implement logging for decision_function. Note, in cases like the EllipticEnvelope estimator, decision_function is called by predict, which is used in score. Additionally , decision_function utilizes score_samples.

Score_samples on the other hand is used to score individual scores across samples. This could be sum or the mean of all these scores are used to calculate score() for many estimators; such as the FactorAnalysis Estimator, PCA estimator, BayesianGaussianMixture estimator, and HalvingGridSearchCV estimator. Score_samples can also be used for density estimation. Scikit-learn examples show score_samples being used in Density Estimation, Density Estimation for a Gaussian Mixture, Kernel Density Estimation for Species Distributions , and Simple 1D Kernel Density Estimation.

Since score_samples() can be explicitly used to density estimation and is used to calculate scores(), Rubicon should support score_samples() and logging. Similar to the solution proposed in #176, when a user calls score_samples(), a new experiment should be opened unless a user specifies which experiment to log to.

capitalone / rubicon-ml

[spike] Investigate Scikit-Learn's decision_function and score_samples #179