emdgroup / baybe

Bayesian Optimization and Design of Experiments
https://emdgroup.github.io/baybe/
Apache License 2.0
246 stars 39 forks source link

Upcoming Diagnostics Package #357

Open Scienfitz opened 2 weeks ago

Scienfitz commented 2 weeks ago

Opening this issue to collect and plan around the upcoming diagnostics package.

After #355 is merged, the last fitted surrogate model will be available. Since we deal with bayesian models our model exposes a posterior method. Applying .mean turns this into a model essentially comparable to standard predictive models. Our last refactoring ensured that the input to that model can be in experimental representation, i.e. the same as accepted by add_measurements, i.e. it can include the unencoded labels etc.

Preliminary Example @AdrianSosic already shared how to utilize this in the PR, for easier access I will copy the crucial part of his example here:

# Assuming we already have a campaign created and measurements added
data = campaign.measurements[[p.name for p in campaign.parameters]]
model = lambda x: campaign.get_surrogate().posterior(x).mean

# Apply SHAP
explainer = shap.Explainer(model, data)
shap_values = explainer(data)
shap.plots.bar(shap_values)

@brandon-holt @alex6022 tagging you here so you have this simple example to go already after #355 is merged. Questions or feedback on this application should ideally be collected here and not in the PR.

Turning this into a diagnostics package Since the very start of this package we had requests for diagnostics, SHAP is one of them, but more traditional users might also want to look at traditional metrics such as goodness of fit etc etc.

Essentially we are now proposing to turn the above code into a subpackag that can be used like this:

from baybe.diagnostics import FeatureImportance
from shap.explainer import PermutationExplainer

FeatureImportance(my_campaign, explainer=PermutationExplainer) #produces same bar plot or shap results as above

Note that SHAP explainers seem to have a common interface (model, data) which means we can allow any shap explainer importable via shap.explainer and shap.explainer.other. The latter even offers non-shap methods such as LIME and MAPLE via the same interface.

Conversely, we could have things like

from baybe.diagnostics import LackOfFitTest

LackOfFitTest(my_campaign)

Open Questions For @AdrianSosic as he is going on longer absence, it would be good to have your definitive decisions on these questions before that.

  1. Are you OK with the above structure, any alternatives you propose or issues?
  2. For instance, should the utilities already do the plots or only provide shap values or plot be an optional argument to the utility?
  3. SHould the utilities accept campaigns, recommenders or both?
  4. This requrires additional dependencies, at the very least shap but probably others for the non-shap methods (that might still come as part of the shap package see above). I would group all of those into a new optional dependency group diagnostics or do we need that more finegrained or even let if fail at runtime?
Alex6022 commented 2 weeks ago

I really like this approach. There could be multiple reasons why decoupling the plotting from the calculation of the SHAP values might be preferred:

  1. Especially for non-tree-based models and large parameter spaces, the calculations can be quite expensive. It might be beneficial to save the SHAP values, since they cannot otherwise be serialized as part of any other object.
  2. The SHAP package provides plenty of plotting options (e.g. beeswarm plots but also dependence scatter plots for a single feature) between which the user could choose after the calculation. To improve usability, these plotting functions could also be wrapped within the diagnostics package.
AdrianSosic commented 1 week ago

Thanks, @Scienfitz, for the nice summary! I think it is very difficult to provide definitive answer to all your questions at the current point since there are still so many unclear parts, like

Anyway, only time will show us what is needed, so I think let's just start with one or two methods in mind and adapt from there. So regarding your questions:

For @AdrianSosic as he is going on longer absence, it would be good to have your definitive decisions on these questions before that.

  1. Are you OK with the above structure, any alternatives you propose or issues?
  2. For instance, should the utilities already do the plots or only provide shap values or plot be an optional argument to the utility?
  3. SHould the utilities accept campaigns, recommenders or both?
  4. This requrires additional dependencies, at the very least shap but probably others for the non-shap methods (that might still come as part of the shap package see above). I would group all of those into a new optional dependency group diagnostics or do we need that more finegrained or even let if fail at runtime?
  1. Seems reasonable for now, potentially with adding the subpackages I mentioned above. What needs to be fleshed out is how to provide (approach-specific?) context information. For instance, in my SHAP example from above, I used the same dataframe to create the Explainer and to evaluate it. These are generally two different things.
  2. Agree with @Alex6022: would always try to separate computation and visualization as much as possible. Probably there are several options how we could achieve this, e.g. by returning an object that offers a plot method
  3. I'm in favor of always going down to the lowest level required (less coupling, more modular, better testability). In this case: feature importance is only relevant in the context of models. And for baybe, this effectively means: it should expect a surrogate model. We can then always easily build high-level access around it like I did for the surrogate itself, i.e. something like def campaign.explain() that then internally takes care of collecting the relevant bits and pieces.
  4. Yep, agree. Let's simply go with diagnostics for now and extend later if needed.