ersilia-os / zaira-chem

Automated QSAR based on multiple small molecule descriptors
GNU General Public License v3.0
26 stars 9 forks source link

Model interpretability roadmap #35

Open miquelduranfrigola opened 4 months ago

miquelduranfrigola commented 4 months ago

Adding an interpretability module to ZairaChem

Background

This project is related to @HellenNamulinda's MSc thesis at Makerere University. The thesis is co-supervised by Dr. Joyce Nakatumba-Nabende. At the moment, ZairaChem does not have any explainable AI (XAI) capabilities. The goal of this project is to develop an automated tool for model interpretability that can be incorporated into ZairaChem. While there are many approaches for chemistry, here we will focus on the following:

  1. A limited set of molecular descriptors: We need to focus on descriptors that a medicinal chemist would understand. Therefore, we will focus on a set of commonly used descriptors. This post can offer some guidance.
  2. XGBoost/CatBoost regression: These are tree-based methods that work well in many scenarios, especially for regression. Automatic hyperparameter tuning can be achieved with Optuna.
  3. Shapley analysis: Shapley values work naturally well with tree-based methods. We will focus on this approach to interpretability.
  4. We will validate the tool in the context of a dataset donated by Medicines for Malaria Venture (MMV) to Ersilia.

Objectives

  1. To develop a standalone Python tool for chemistry specifically oriented to Shapley value analysis of molecular descriptors. The tool is called xai4chem.
  2. To apply the tool to the MMV dataset.
  3. To incorporate the tool into ZairaChem.

Steps

FAQ

Where do we create issues?

Most issues related to this work should be created in the xai4chem repository. When we reach a point of integration to ZairaChem, we can create issue there correspondingly.

Is there a more comprehensive description of the project available?

Yes. This is part of @HellenNamulinda 's MSc project and she is writing a thesis accordingly. A project proposal document is already available.

HellenNamulinda commented 3 months ago

Progress Updates

Next

miquelduranfrigola commented 3 months ago

Thanks @HellenNamulinda - this is useful.

We did not discuss much about optuna. Does it work as expected for you? Do you get better results than using XGBoost with default parameters?

HellenNamulinda commented 3 months ago

Hello @miquelduranfrigola, I apologize for the delay in providing this update.

I've compared XGBoost's performance with default parameters against those optimized by Optuna. Surprisingly, the default parameters seem to yield better results, with an R2 of 0.50 compared to around 0.3 achieved with Optuna. I will add these findings in the notebook that I will uipload later today.

It's worth noting that the parameters optimized by Optuna can vary in each study, introducing some uncertainty in the results. I may need to adjust the search space to align more closely with the default parameters.

Additionally, I've observed that training a CatBoost model is consistently slower, which prolongs the optimization process with Optuna.

However, I believe Optuna is still valuable. It's essential to carefully define the search parameters to achieve optimal results.

miquelduranfrigola commented 3 months ago

Thanks @HellenNamulinda , this is useful. I agree we need to use optuna. We'll have to play a bit with the search space, then, and perhaps increase the number of iterations.

HellenNamulinda commented 3 months ago

Progress Updates

Next From the meeting,

miquelduranfrigola commented 3 months ago

Thanks @HellenNamulinda , all next steps sound good to me.

HellenNamulinda commented 2 months ago

From the meetings,

@miquelduranfrigola, From the experiments using the MMV dataset. with feature selection, mordred gives an r2_score of 0.39 and mae of 11.73, compared to rdkit r2_score of 0.35 and mae of 12.24. Also in the TDC benchmarks, the performance of the pipeline is promising. With the performance on some datasets placing in top 3 in the TDC leaderboards. More information in the sldes.

We agreed to experiment and compare performance, rdkit descriptors without feature selection(using all the descriptors).

Note: All the experiments were done done using catboost with default parameters.

Trying zero-shot, XGBoost's performance on the pf_3d7_ic50 data improved from r2_score: 0.64 to r2_score: 0.71. Am going to finalize testing this and we use zero-shot in place for optuna.

HellenNamulinda commented 1 month ago

From last week, We agreed to use morgan fingerprints. And this was implemented(https://github.com/ersilia-os/xai4chem/pull/11/commits/d2ff5800b5db308fa8ab56be4ea9ae081b4e91e6).

Also, use Zero-shot AutoML(https://github.com/ersilia-os/xai4chem/pull/11/commits/f5d5ad808197f99a4f01b7ce8811e18b4e5e4c01). but FLAML zero-shot only supports XGBoost and not Catboost.

HellenNamulinda commented 1 month ago

To be able to interprete other trained models besides the regression models developed using xai4chem, it was best to have the explain_model as a separate module(independent of the regressor).

With the explain_module, interpretability plots can be generated even for trained classification models.

HellenNamulinda commented 3 weeks ago

Hello @miquelduranfrigola, From the meeting, we have reviewed everything that has been implemented in xai4chem.

In our pipeline, we choose features to be any of the three descriptors(Small(datamol), Mid-size (RDKit), and Large (Mordred)) or the count-based morgan fingerprints. Also, Feature selection automatically selects the relevant k features during training, if the value of k is given.

For interpratability, we are currently saving three interpretability plots; barplot, beeswarm plot and a waterfall plot for the first data sample(this can be generated for other samples).

All the other usuage details are documented in the README.

Some pending concerns Save test results as csv file(not joblib) Save shapley values as csv file(not only interpretability plots) Add interpretation onto chemical structures for morgan fingerprints?

Benchmark With xai4chem pipeline on PPBR_AZ dataset(Plasma Protein Binding Rate(%)): Using RDKit descriptors, the MAE of 7.618 and r2=0.3401 places 2nd in the leaderboard.

The MMV Data: We started with a small set(LDH assay: 4816 samples). The performance is ranging between 0.36 and 0.40(r2_score) and 12.10 and 11.53(MEA).

And this brings us to combining descriptors and fingerprints? What would you advise on Feature Maps as input features -2D(descriptors and fingerprints). I haven't yet implemented it, but it is something I have started looking at next.

miquelduranfrigola commented 3 weeks ago

Thanks @HellenNamulinda — very informative.

Let's first close the pending concerns and then we will look into blending or not descriptors and fingerprints.

HellenNamulinda commented 1 week ago

@miquelduranfrigola, For the test results, we are saving a csv file containing the smiles strings and the model output values. Where as for the interpretability results csv file, we are saving the descriptors/fingerprints and the shapely values.

This week, I'm working on mapping interpretation(shapely values) unto chemical structures for fingerprint features.