TeamHG-Memex / eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions
http://eli5.readthedocs.io
MIT License
2.74k stars 332 forks source link

Got completely different results when changing n_samples for TextExplainer #330

Open kevinyang372 opened 4 years ago

kevinyang372 commented 4 years ago

Hi! Thank you for developing this amazing explainer for machine learning models.

I am currently trying to use LIME's TextExplainer model to get some insights into the BERT model I developed for Japanese text classification problems. With n_sample of around 500, I was able to get the metrics score of mean_KL_divergence: 0.044, score: 1.0. However, when I increased n_sample to 1000, the explainer gives almost opposite weight explanations (texts highlighted in green are now highlighted in red) and the metrics score remains high at mean_KL_divergence: 0.0298, score: 1.0

Is it due to the nonlinearity of BERT model? Do you have any advice on how I should solve this problem?

import eli5
from eli5.lime import TextExplainer
from pipe import predict_proba

te = TextExplainer(n_samples=1000, random_state=42)
te.fit(test, predict_proba)

te.show_prediction()