TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.74k stars 332 forks source link

Probabilities do not sum to 1 ValueError in show_predictions() #342

Open aakaashjois opened 4 years ago

aakaashjois commented 4 years ago

I have a rest API which returns the predictions for my model. I have written a wrapper function around so that it returns the results similar to scikit-learn predict_proba() method. This is the code I am running:

from eli5.lime import TextExplainer

te = TextExplainer()
test = 'Testing text explainer'
te.fit(test, get_prob) # get_prob is the wrapper function
te.show_predictions()

This results in a ValueError: probabilities do not sum to 1.

Checking the sum of probabilities manually gives:

get_prob(te.samples_).sum(axis=1)
array([1.        , 0.99999994, 0.99999994, ..., 1.        , 1.        ,
       1.        ])

They are all very close to 1 or exactly 1. Is there a way to get past this error?

forough71 commented 4 years ago

I have the exact same problem.

Querela commented 3 years ago

My 'solution', in the binary case, I just take one side and for the other I use the inverse to have it sum up to 1. As I 'know' that it almost sums up to 1 this is ok but a better solution would be good. Maybe some threshold value for this sanity check? 1e-7 (in my case the difference is in the 8th digit)

A more general solution, inspired by: https://stackoverflow.com/a/25985217/9360161

a = np.array([[0.4, 0.5], [0.6, 0.4]])
b = a.sum(axis=1).astype(float)[:, np.newaxis]
c = a / b
c.sum(axis=1) == np.ones(2)

print(c)
# [[0.44444444 0.55555556]
#  [0.6        0.4       ]]

So, just updating the (user-defined) get_prob functions to 'fix' each row should suffice.