RTXteam / RTX

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)
https://arax.ncats.io/
MIT License
33 stars 21 forks source link

Should we treat biolink:Metabolite as a drug for DTD? #1323

Closed chunyuma closed 3 years ago

chunyuma commented 3 years ago

This question is raised from issue #1321.

The biolink:Metabolite is a direct child of biolink:ChemicalSubstance but biolink:ChemicalSubstance has many other children such as FoodAdditive, Nutrient, ProcessedMaterial, etc. See Children at https://biolink.github.io/biolink-model/docs/ChemicalSubstance.html

Here is an example: Glycyl-Histidine (HMDB0028843) is a metabolite so it should be a chemical too. Although it might not be directly used as drug but it does involves some biological actions like physiological or cell-signaling effect.

Screen Shot 2021-03-24 at 1 05 50 AM

dkoslicki commented 3 years ago

Personally, I think we should treat any child of chemical substance as a drug. Recall the example of cyclic vomiting being treated with ethanol; so I would assume that we shouldn’t exclude food additives, nutrients, etc. from drug repurposing targets. Better to cast a wide net in the beginning, and then after potential repurposing targets are found, figure out if it would actually be viable to use as a drug.

chunyuma commented 3 years ago

Hi @dkoslicki, theoretically, it is true that all children of chemical substance can be treated as a drug. But in practice, we might not be able to do this because of the training data. Currently, the training data we used for DTD model are mainly from MyChem, semeddb and NDF. The MyChem training data are mainly the curies with prefix CHEMBL.COMPOUND; the semeddb data are mainly the curies with prefix CHEBI, CHEMBL.COMPOUND and MESH; the NDF data are also CHEMBL.COMPOUND. If we want to include food additives, nutrients, etc as a drug, we have to also include them in our training data. Otherwise, the model can't learn the features from these children of chemical substance.

I don't know if our training data MyChem, semeddb and NDF have already contains the pairs of food additives, nutrients with disease since currently kg2.5.2 don't have these categories yet. But I checked the provider source distribution of biolink:Metabolite based on kg2.5.2c, they are mainly KEGG. In our training data, we don't have KEGG.

Screen Shot 2021-03-24 at 3 42 45 PM

To validate whether the model has the predictive power for the curies outside biolink:Drug and biolink:ChemicalSubstance, we need to add a set of drug-disease pairs for the curies outside biolink:Drug and biolink:ChemicalSubstance in this plot:

90923608-f9597380-e3bb-11ea-8abe-ee3bdc1e84aa

chunyuma commented 3 years ago

Thanks to @dkoslicki's suggestion, in the next version of DTD model, we decide to treat biolink:Metabolite, biolink:ChemicalSubstance and biolink:Drug as general drugs in DTD model. So this issue can be closed