NorskRegnesentral / shapr

Explaining the output of machine learning models with more accurately estimated Shapley values
https://norskregnesentral.github.io/shapr/
Other
141 stars 33 forks source link

Adding the VAEAC approach #369

Closed LHBO closed 6 months ago

LHBO commented 9 months ago

In this pull request, we add the vaeac (variational autoencoder with arbitrary conditioning) approach for estimating the conditional Shapley values. The vaeac approach allows for imputing missing values for arbitrary coalitions using a single variational autoencoder. Additionally, the approach supports mixed data. Finally, we have implemented the approach in torch for R.

The vaeac approach was introduced by Olsen et al. (2022), but they only provided a python implementation built on top of the shapr package.

Key points in this pull request are the following:

  1. Implemented the vaeac methodology in native torch in R.
  2. Incorporated the vaeac approach into the shapr package as an additional method that supports mixed data.
  3. Added verbose level to shapr, which allows the user to specify the degree of messages that will be given. This is in addition to progressr's progress bar.
  4. Added functionality for sending a pre-trained vaeac model to explain() and for continuing to train the vaeac model for additional epochs if a user is unsatisfied with the accuracy.
  5. Added plot functions for monitoring the performance of the vaeac approach.
  6. Added plot_SV_several_approaches(), which allows us to plot and compare several Shapley value explanations for the same explicands simultaneously.
  7. Added a vignette specifically for the vaeac approach to demonstrate its capabilities and how to use it, in addition to some extra sections/paragraphs in the main vignette.
  8. Changed how the vignettes are built to speed up the time. Now, we have to build them manually.
  9. Added support for GPU. However, the CPU is often comparable/faster for smaller datasets.
martinju commented 9 months ago

Ok, did not realize something is up with the OLD vignette, lines 831-851. Will look at this tomorrow.

martinju commented 9 months ago

TODO to make checks pass:

LHBO commented 8 months ago

TODO to make checks pass:

  • [x] Install torch
  • [x] LARS: Move from Boston MASS dataset to airquality (and rather make some variables categorical as in the tests). If we use the MASS package in the vignette, we need it as imports in DESCRIPTION, which we want avoid
  • [x] LARS: plot_vaeac_training_evaluation does not exists? For now I just removed it from the vignette for it to build. Needs to handle that properly.
  • [x] LARS: Add a vaeac.save_model option to explain which
  • [x] Fix NSE warnings in checks
  • [x] LARS: Fix check warnings:

    • [x] Missing link or links in documentation of:

    • [x] SkipConnection: get_imputation_networks,

    • [x] Specified_masks_mask_generator: ‘row1, row1, row2, row2, row3, row3, ...’

    • [x] Specified_prob_mask_generator: ‘row1, row1, row2, row2, row3, row3, ...’

    • [x] Undocumented arguments in documentation object 'vaeac_continue_train_model': ‘explanation’ ‘...’

  • [x] LARS: model.cuda in L1362 of approach.vaeac.R does not exists. What is the intention there?

Fixed all of these points. However, I am unsure if the package is installable here on Git. It runs fine on my Mac.

LHBO commented 7 months ago

I fixed all comments sent by email and above. Refactored almost all the code and made separate functions for code that were identical in several places. We need to discuss verbose, but that should be done in a separate PR when we decide what the different levels actually mean.

Checked that all tests run (the check files must be updated) and added some global var in zzz file.

LHBO commented 7 months ago

We must remember to update the text in the main vignette. E.g. in "examples" section vaeac is not mentioned as a possible method.