RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
39 stars 8 forks source link

Free text info from DrugBank #369

Open dkoslicki opened 7 months ago

dkoslicki commented 7 months ago

DrugBank has a bunch of extra information on their website; stuff like mechanisms of action: image (7)

I'm wondering if: a) this information exists in some downloadable part of DrugBank or b) if there were particular barriers to including this info when DrugBank was ETL'd for KG2.

chunyuma commented 6 months ago

Hi @dkoslicki, I think you are able to extract this information from this downloadable XML file, which needs the license to download.

dkoslicki commented 6 months ago

Thanks @chunyuma , Mohsen and I found this and are undergoing the process of NER, identifier extraction, and alignment from all the fields in these. I'm optimistic this can be useful for future xDTD kinds of efforts

chunyuma commented 6 months ago

Sounds great! Happy to see this will be useful. One difficult thing I saw using this information is the mapping between this text and the node in KG2, and there are the fewer connections in KG2 between the mapped nodes. But I would be happy to share my experience with Mohsen if needed.

ecwood commented 3 months ago

It does look like this information is in the DrugBank XML download:

  <mechanism-of-action>Eliglustat is a glucosylceramide synthase inhibitor used for the treatment of type 1 Gaucher disease.[L41404] Gaucher disease is a rare genetic disorder characterized by the deficiency of acid β-glucosidase, an enzyme that converts glucosylceramide (also known as glucocerebroside) into glucose and ceramide. In patients with Gaucher disease, glucosylceramide is accumulated in the lysosomes of macrophages, leading to the formation of foam cells or Gaucher cells.[L41404] Gaucher cells infiltrate the liver, spleen, bone marrow and other organs, leading to complications such as anemia, thrombocytopenia and hepatosplenomegaly.[L41404,A246384]&#13;
&#13;
Eliglustat reduces the production of glucosylceramide by inhibiting glucosylceramide synthase, a rate-limiting enzyme in the production of glycosphingolipids.[L41404,A182192] This lowers the amount of glucosylceramide that is available in lysosomes, and balances the deficiency of acid β-glucosidase.[L41404,A246384]</mechanism-of-action>

I will have to brainstorm how to ETL it since the information inside of the free text is not linked to any other node IDs. I assume you want this information converted into edges?

chunyuma commented 3 months ago

Hi @ecwood, I agree, to convert this information into edges, we need to first resolve the mapping issue. The second issue we may need to address is extracting the correct relation logic from the free text. I have known that this is not easy because SemMedDB failed in some cases. Perhaps LLM is an option because it is smarter than the algorithm used in SemMedDB.

dkoslicki commented 3 months ago

Unless there is a large appetite to do this, I've done this on the side/locally (to use for Pathfinder training), so at least for my purposes, I have what I need. I'd want to hear from the KG2 team if they think this is worth pursuing otherwise