NorskRegnesentral / shapr

Explaining the output of machine learning models with more accurately estimated Shapley values
https://norskregnesentral.github.io/shapr/
Other
147 stars 34 forks source link

Restructure explain() for iterative estimation with convergence detection, verbose arguments ++ #396

Closed martinju closed 1 month ago

martinju commented 5 months ago

Very early draft. Lots of cleanup and moving things around remains, but the general overall structure will probably be close to what we got here.

To be done in this PR (some may be removed here and handled in separate PRs):

Note: All non-exact methods fails now (also the Shapley values estimates) since shapley_setup is now called after setup_approach. All tests for Shapley values pass if these calls are but back to the original order (but we don't want that in the future).

martinju commented 4 months ago

Just some notes for myself on where to catch up after the holiday:

  1. Carefully check that the structure and content of the output from explain is as we want it to be at the current stage.
  2. Accept all tests with the new structure such that future edits can be checked against the tests
  3. Add some basic tests for the new convergence stuff.
  4. Consider updating all tests with the new defaults (paired sampling and reweighting)
  5. THEN go ahead and add the other features: paralellization of the boostrapping, verbose argument, disk saving, same functionality for groups etc.

Depending on how things go as I get back on this in august, @LHBO might take a look at the code structure some time after point 3 is done.

martinju commented 3 months ago

Slowly reaching a steady state. Apart from the list of undone tasks above, here is a list of components which is currently not in a good state

martinju commented 3 months ago

@LHBO OK, some more work done now. A lot of minor code changes as I have moved from combinations to coalitions everywhere, i.e. changed n_combinations to n_coalitions, id_combinations -> id_coalitions and so on. The key data.table X also got new (general) column names: features -> coalitions, n_featuers -> coalition_size and so on. Note that features is still added as a new column in the end, making it easier to create the binary matrix S for both groups and features. I have also added a few extra helping parameters n_shapley_values is equal to n_features for features, and n_groups for groups. coal_feature_list is the same as the previous group_num, except that it also exists for feature wise explanation.

As a consequence of the name generalizations and helper parameters, I have removed many of the almost-identical functions that are specific for groups (the old feautre_exact/group_exact). These now got the common name exact_coalition_table, just. Similar generalizations are done elsewhere.

Tests are updated after the changes. Something is wrong with groups for forecast, but we'll just ignore that for now.

Feel free to take a quick look at the main components (no need to look at the details at this stage), and let me know if you have comments to the generalizations, name changes etc.

martinju commented 1 month ago

Hi @aredelmeier @LHBO @jonlachmann

This is closing in on a merge. Just a few things misisng now, I think. I hope to be able to merge this some time this weekend.

Here is what remains:

@jonlachmann You may want to merge this into your forecast fixing branch.

@LHBO If you want to , I actually think you can safely start on the asymmetric stuff from the current stage. What remains will not change much of the code.