Closed josephykwang closed 1 year ago
Hi @josephykwang, I am not aware of such examples.
There are multiple thoughts that come to mind.
First, many explanations like Break-down, Shapley values, What-if analysis, Partial dependence etc. require to predict on newly created observations (may not be contained in the original dataset). So in principle, it is not possible to estimate explanations using only such data.frame
.
Second, what do you mean by the word "reports"? dalex
outputs data frames and plots. dalex.Arena
module generates interactive dashboards to explore.
Third, it will be challenging to accurately estimate explanations on 100M+ records in finite time.
for now, we don't need instance level explanation. what is the largest dataset that you have tested? how many features? On Friday, November 18, 2022, 07:36:24 AM PST, Hubert Baniecki @.***> wrote:
Hi @josephykwang, I am not aware of such examples.
There are multiple thoughts that come to mind.
First, many explanations like Break-down, Shapley values, What-if analysis, Partial dependence etc. require to predict on newly created observations (may not be contained in the original dataset). So in principle, it is not possible to estimate explanations using only such data.frame.
Second, what do you mean by the word "reports"? dalex outputs data frames and plots. dalex.Arena module generates interactive dashboards to explore.
Third, it will be challenging to accurately estimate explanations on 100M+ records in finite time.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
for now, we don't need instance level explanation.
Permutation-based variable importance and partial dependence plots are model-level (global) explanations requiring predict inference too
what is the largest dataset that you have tested? how many features?
Tested which method exactly? You need to be more specific.
closing due to no response
reopen if needed
Our models are not trained using R nor scikitlearn. We've our own model format.
Do you have examples of regenerating various reports if only provides a dataframe where each row has features, label, and prediction? The dataframe can contain 100M+ records.