ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
189 stars 123 forks source link

🦠 Model Request: Adverse Drug Reaction Prediction using Open TG-GATES #502

Open DhanshreeA opened 1 year ago

DhanshreeA commented 1 year ago

Model Name

Adverse Drug Reaction prediction

Model Description

Prediction of adverse drug reactions using chemical induced gene expression profile

Slug

adr-prediction

Tags

classification

Publication

https://www.frontiersin.org/articles/10.3389/fddsv.2021.768792/full

Code

https://github.com/attayeb/adr has model checkpoints

License

No response

DhanshreeA commented 1 year ago

This study has developed models for adverse drug reactions (ADR) based on drug induced gene expression profiles. This is a multi label set up, ie one gene expression profile can have different types of adverse drug reactions. They have built total 14 models each corresponding to an ADR. They have used the intersection of two datasets for this task, namely Open TG-GATES and a standardized FAERS. Open TG-GATES is in-vitro and in-vivo toxicogenomic profiling on humans (in-vitro) and rats (both), whereas FAERS is FDA's adverse event reporting system. Interestingly, they have used FAERS as a way for building a labeled data set by classifying subsets of drugs as having positive or negative association to an adverse reaction (by performing a Lasso regression and keeping the threshold for treated vs control at 92%) and then using gene expression profiles those drugs from Open TG-GATES as input for further predictive modeling for newer drugs. The Open TG-GATES data is present as CEL files (They use an R package called affy for working with these), and the FAERS dataset can be loaded into a SQL server.

miquelduranfrigola commented 1 year ago

Thanks @DhanshreeA, this is great.

As discussed, let's leave this issue open but on hold for now. The reason for deprioritising this model is that the input is gene expression data, which will not be straightforward to harmonize with the rest of input of the hub (currently, SMILES strings). One possibility is to add an extra mapping step through the Connectivity Map, but this will require substantial amount of validation.

As soon as we have more models that require gene expression as input, we will go back to this issue.

DhanshreeA commented 1 year ago

I have requested help from the authors for a minimal Python based setup for testing out these models here: https://github.com/attayeb/adr/issues/1