ModelOriented / DALEX

moDel Agnostic Language for Exploration and eXplanation
https://dalex.drwhy.ai
GNU General Public License v3.0
1.38k stars 166 forks source link

can dalex generate reports if only given features, label, and prediction? #531

Closed josephykwang closed 1 year ago

josephykwang commented 1 year ago

Our models are not trained using R nor scikitlearn. We've our own model format.

Do you have examples of regenerating various reports if only provides a dataframe where each row has features, label, and prediction? The dataframe can contain 100M+ records.

hbaniecki commented 1 year ago

Hi @josephykwang, I am not aware of such examples.

There are multiple thoughts that come to mind.

First, many explanations like Break-down, Shapley values, What-if analysis, Partial dependence etc. require to predict on newly created observations (may not be contained in the original dataset). So in principle, it is not possible to estimate explanations using only such data.frame.

Second, what do you mean by the word "reports"? dalex outputs data frames and plots. dalex.Arena module generates interactive dashboards to explore.

Third, it will be challenging to accurately estimate explanations on 100M+ records in finite time.

josephykwang commented 1 year ago

for now, we don't need instance level explanation.  what is the largest dataset that you have tested? how many features? On Friday, November 18, 2022, 07:36:24 AM PST, Hubert Baniecki @.***> wrote:

Hi @josephykwang, I am not aware of such examples.

There are multiple thoughts that come to mind.

First, many explanations like Break-down, Shapley values, What-if analysis, Partial dependence etc. require to predict on newly created observations (may not be contained in the original dataset). So in principle, it is not possible to estimate explanations using only such data.frame.

Second, what do you mean by the word "reports"? dalex outputs data frames and plots. dalex.Arena module generates interactive dashboards to explore.

Third, it will be challenging to accurately estimate explanations on 100M+ records in finite time.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

hbaniecki commented 1 year ago

for now, we don't need instance level explanation.

Permutation-based variable importance and partial dependence plots are model-level (global) explanations requiring predict inference too

what is the largest dataset that you have tested? how many features?

Tested which method exactly? You need to be more specific.

hbaniecki commented 1 year ago

closing due to no response

reopen if needed