Closed mayer79 closed 4 years ago
@pbiecek I'm hitting an issue where I need to specify sample weights in ingredients::feature_importance()
calculation. After a bit of thought I'm thinking maybe this belongs in the explainer construction. Please let us know what you think makes sense.
@kevinykuo
Do you think it would be enough to add vector for sample weights to the explainer in the same way as data
/y
?
As an additional argument of the explain()
function.
@pbiecek I think that would work.
@kevinykuo in DALEX v0.4.9 explainer
has a slot weights
. https://modeloriented.github.io/DALEX/reference/explain.html
@pbiecek Awesome thanks! Now for the downstream functions/parameters, e.g. loss_function
for feature_importance()
and its default DALEX::loss_root_mean_square()
, would it be possible to add weights
to their signatures without breaking backwards compatibility?
@kevinykuo I guess that there are at least two options:
add weights to all loss_...
functions, if not NULL then to weighted loss. Pros: relatively small change in other functions. Cons: this shall be done also for loss functions outside DALEX. Weights will not work for many explanations.
add new loss functions like loss_weighted_sum_of_squares
and let user to explicitly call loss_weighted_sum_of_squares
for weighted_feature_importance
. Pros: user will explicitly control what loss function is going to use. Cons: there will be more loss functions.
For the second option we may add a new weighted_feature_importance
function. It wont be a silent change, but it will be easier to recognize for which loss functions one can use weights and for which one cannot.
Thought about this some more and I'll cast a vote for option 1) as it won't clutter the API as much and we can throw messages/warnings when users try to use a weighted loss function with explanations that don't support them.
@pbiecek what about making loss_default
function recognize if weights are set and then applying loss_function according to them?
flashlight
is doing great job in working with case weights
in DALEX it looks like you can work with case weights but you need to specify your own loss functions.
Wonderful package, thanks a lot, I am using it quite frequently.
Here are some suggestions, especially having models with case weights in mind:
The function
model_performance
allows to pass a loss function of the residuals (as single argument) only. This seems too restrictive as many loss functions cannot be written as a function of residuals but rather of the observed value and the predicted one. Most DALEX functions allow a general loss function with two arguments, not just one. This inconsitency is e.g. a problem when we want to plot deviance residuals of a logistic regression.Ability to enhance plots with ggplot functions, i.e. allowing to add e.g.
+ geom_points(...)
variable_importance
: In our case, the custompredict
function depends on a weight vectorw
(basically the raw predictions are multiplied byw
). Since the DALEX interface requires this vector to be a column in the data frame passed toexplain
,w
currently appears in thevariable_importance
output as well. Would it be an option to add an optional "dropVars = NULL" argument tovariable_importance
(or even inexplain
)?Like issue 3, but for the
single_prediction
part. Here, it is not only an aesthetic problem if a "special" column like case weightw
appears.