Closed monte-flora closed 4 years ago
Probabilistic outputs look something like this: [[0.1, 0.8, 0.1], [0.2, 0.8, 0.0]]
, where here each row contains a normalized array. So the above first checks whether we have a 2-d array and then eliminates things of the form [[0.2], [0.8]]
, which only contains one item in each row and therefore implicitly means [[0.2, 0.8], [0.8, 0.2]]
or something similar.
I did once consider using len(np.unique(.))==2
, but I then realized that this would catch the faulty case of only two observed predicted probabilities, e.g. [x1, x2, ...]
, where xi = [0.0, 0.0, 1.0] or [0.0, 1.0, 0.0]]
. So I think it's correct as it is. Do you have a specific example which is doing the wrong thing? It's always possible I didn't think of a use case
Well let's clarify, in the documentation
`:param scoring_data: a 2-tuple (inputs, outputs)
for scoring in the
``scoring_fn```
are the outputs meant to be predictions or the target values?
Outputs is intended to be the target values
In that case, the target values should just be a column vector and not a 2-d array, correct?
outputs = [ 0, 1, 0, 1, 0, 1, 1, ...]
I raise this issue because the logic above did not recognize I was dealing with a column vector of binary outputs and therefore assumed I was making deterministic predictions.
In general, the outputs
could be any possible shape. In fact, in my tests, I think I have everything ranging from a None
object to a 2-d array. However, for the most common use case, you are right, the target values will be a 1-d array or a vector or something of that shape. Do your outputs look like one of these: outputs = [[0.2], [0.8], [0.1], [0.3]]
or outputs = [0.2, 0.8, 0.1, 0.3]
The second case. Just a list of binary values.
Okay, I get what's going on. As we are discussing in the other issue (#83), this is a bit of a misnomer, where here probablistic
means that the data is of the form [row1, row2, ...]
, where each row=[prob class a, prob class b, prob class c, ...]
. The reason for this is the difference between the predict
and predict_proba
calls of the underlying sklearn models, which output different things. So while your data is actually probabilistic, it gets fed through my wrapper methods as though it were deterministic. The final evaluation_fn
then gets it in exactly the same form and then does something with it. If the scoring function is, for instance sklearn.metrics.roc_auc_score
which expects exactly this 1-d list of probabilities, then my code will treat the data in the intermediate stages as "deterministic" but the end function knows that it is really probabilistic and will handle it correctly. Does everything run if you use score_trained_sklearn_model
instead of score_trained_sklearn_model_with_probabilities
?
I should say that everything runs fine. My main problem was the original AUC (no permutations) was coming out as 0.65 when I know it should be closer to 0.9. I assumed the problem was the predict
method, which outputs binary predictions rather than probabilistic predictions. So I manually changed the code as I suggested above and the other issue(#83) and got the more anticipated results. The low AUC was also likely associated with issue #80. Ultimately, I fixed the problems on my end but wanted to raise the issues in case the fixes might be helpful for other users.
On line 82 in permutation_importance.py,
if len(scoring_data[1].shape) > 1 and scoring_data[1].shape[1] > 1:
the code attempts to determine if the scoring outputs are probabilistic or not. However, I was confused by the logic. Assuming scoring outputs is a 1-d vector (e.g., [0,1,1, 0,0,0,1]), then len(scoring_data[1].shape) == 1. A possible fix may beif len(np.unique(scoring_data[1]))==2:
which instead asks if the output values are binary.