cristian931 / T-ARDIS

This repository contain all the file necessary to download, compute and statistically validate the pairwise interaction between Drug side effects and Drug Targets
Apache License 2.0
2 stars 1 forks source link

Combine Adverse Drug Reaction databases #2

Open Jordi-Valls opened 2 years ago

Jordi-Valls commented 2 years ago

Hi Cristiano,

First of all thanks for your answers, now I cannot following to install all python packages in the environment... I hope do it in proper months.

Now I'm asking about how do you combine the ADR databases, FAERS with MEDDEFFECT and OFFSIDES and SIDER?? Do you combine the Adverse effects by name? because I see those databases and their IDs are not compatible right??

If you could tell me how you do it I will appreciated a lot.

Jordi

cristian931 commented 2 years ago

Hi Jordi, sorry to hear that you have still problems installing the packages, if I can do something about it I'll gladly help.

Regarding your current question I suggest as first thing to read the paper https://academic.oup.com/database/article/doi/10.1093/database/baab068/6408542. (That I forgot to add in the README, yay for me).

In a nutshell you're right, the databases present the drugs with different names and IDs but luckily the adverse reactions names follow the MEDDRA guidelines. So the main issues was to obtain a common name for the same drugs exploiting the Athena vocabularies. The SQL cleaning procedure for the FAERS database is based on another repository https://github.com/ltscomputingllc/faersdbstats, I simply updated their code to accept the new FAERS data (since theirs were programmed just to accept the data until 2018) and extended the procedure also to the MEDEFFECT database.

As I suppose you know, in the FAERS and MEDEFFECT database you can find reports for both Acetylsalicylic Acid and Aspirin, even if they point to the same drug. With the SQL procedure, each entry in the database is mapped against different vocabularies containing the most common name for the drug and its synonyms. So, Acetylsalicylic Acid will be mapped to Aspirin, Tachipirina will be mapped to Paracetamol and so on. Moreover the procedure try also to "correct" the report drug's name in case of grammatical errors or typos, or if there aren't any possible match the report is discarded.

The OFFSIDE and SIDER cleaning procedure is more straightforward, since the data contained in this two database is already curated and the drug's names is already the 'common used'.

Finally the data collected from the different databases are combined using the drug's "common" name. NB the cleaned name is also used later in the procedure to combine the drugs with their target.

On unrelated note, if you need just the final data, T-ARDIS is available also as webservice at http://www.bioinsilico.org/T-ARDIS/

Jordi-Valls commented 2 years ago

Great Cristiano! THanks for your quick response, I will revise this information!

Thanks for your time

Jordi