Adds two user-functions: explain_lingaussand explain_lingauss_precomputed().
These allows fast computation of Shapley values for purely linear models (i.e. no interactions, quadratic terms etc) under the assumption of a Gaussian distribution for the features.
The implementation is based on Sec 2 here: https://arxiv.org/pdf/2006.16234.pdf, but with a somewhat simplified formula for the Tmu and Tx formulae avoiding the need to compute the Q-matrix as it always take the same form.
The permutation based Shapley estimation approach is used here instead of the kernelSHAP Shapley estimation approach used elsewhere in the package. Another PR will make that universally available. The
The pairwise sampling is applied and always used (currently not an option to disable this).
TODO
[ ] Implement the permutation sampling in Rcpp.
[ ] Implement the looping over Tmu/Tmx in Rcpp
[ ] Add MSE computation? We don't have the v(S) directly computes, and probably don't want to compute it either, but can we simplify the MSE computation in this case under the assumption on the model being linear, but without assuming the features are gaussian (in practice)? I do think that might be possible -- look at the formulas to verify this. Then we can decide whether it is worth implementing or not.
[ ] Implement grouping. I guess the best way to do this, is to sample group permutations first, and then translate these to the appropriate
[ ] Update vignette with example on how to use the method
Adds two user-functions:
explain_lingauss
andexplain_lingauss_precomputed()
. These allows fast computation of Shapley values for purely linear models (i.e. no interactions, quadratic terms etc) under the assumption of a Gaussian distribution for the features.TODO