NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

SmallMolecule CHEMBL.COMPOUND:CHEMBL1881825 related to NamedThing causes timeouts #23

Closed sandrine-m closed 4 months ago

sandrine-m commented 1 year ago

I am currently working on a use case on CHEMBL.COMPOUND:CHEMBL1881825. My collaborators working on that project published several papers (Chou DH-C et al., ACS Med Chem Lett 2011, PMID: 21927648,Chou DH-C et al., J Am Chem Soc 2015, PMID: 26042473,Vetere, Amedeo, et al., Nature reviews Drug discovery, PMID: 24525781) regarding this compound and its relationships to pathways, cellular processes, and cellular phenotypes.

Running the query below through ARAX, I only get edges from MolePro (chemically_similar_to and correlated_with). ars-default-agent and arax-ara are running for ever without ending and ara-aragorn returns Error 504. I do not see any information related to this compound that I would expect to see coming from those publications.

Thank you for your help!

Query ID: 385f6c86-9c1f-48ee-a859-ea9aae336f46

{
  "edges": {
    "e00": {
      "object": "n01",
      "predicates": [
        "biolink:related_to"
      ],
      "subject": "n00"
    }
  },
  "nodes": {
    "n00": {
      "categories": [
        "biolink:SmallMolecule"
      ],
      "ids": [
        "CHEMBL.COMPOUND:CHEMBL1881825"
      ]
    },
    "n01": {
      "categories": [
        "biolink:NamedThing"
      ]
    }
  }
}

Results:

image
marcdubybroad commented 1 year ago

@sandrine-m Attached is the information I'm pulling for the three pubmed IDs included from a script that uses the service https://biothings.ncats.io/semmeddb/query?q=pmid:21927648&size=30

sandrinePapersSenmedDbUmls.csv

bill-baumgartner commented 1 year ago

Text Mining Provider currently uses CHEBI and Drugbank as its source for chemicals. CHEMBL.COMPOUND:CHEMBL1881825 does not appear in Drugbank as far as I can tell, and while it does appear in CHEBI (CHEBI:92195), the CHEBI record is not associated with the BRD0476 synonym used in the papers linked above. Because of this, I would not expect to find it in our KGs at this point. It looks like the synonym is specified in the corresponding PubChem record, so I will note that as a potential target for improving our chemical entity recognition.

sandrine-m commented 1 year ago

Hi @marcdubybroad This is exactly the type of information I expecting to see Translator returns. Are this info available to Translator? If so, which biolink predicates/categories does it map to?

Thanks!

sierra-moxon commented 1 year ago

also split into a ticket for architecture group:

sandrine-m commented 1 year ago

Thank you @bill-baumgartner ! MolePro has worked on a chemical synonym reconciliation and could help on the chemical name mapping when you get a chance to revisit the info.

marcdubybroad commented 1 year ago

Hi @sandrine-m, I believe the BTE team is maintaining the infores:biothings-semmeddb resource from the infores catalog Google sheet, so I would ask Colleen who the contact is for the work. While I can search what triples get associated with each pubmed id paper, I don't know to what edges they are being linked.

sandrine-m commented 1 year ago

Thank you so much @marcdubybroad ! I'll get in touch with Colleen!

sandrine-m commented 1 year ago

Got back from BTE, semmeddb does not relate the papers with any chemical.

bill-baumgartner commented 1 year ago

Assuming I'm querying it correctly, the SRI name resolver does not seem to have the chemical name (BRD0476) that @sandrine-m is looking for, so it may be that for chemicals the TMKP should see if MolePro can provide a mapping from CURIEs to synonyms.

curl -X 'POST' \
  'https://name-lookup.test.transltr.io/lookup?string=BRD0476&offset=0&limit=10' \
  -H 'accept: application/json' \
  -d ''
cbizon commented 1 year ago

Hmm, that's correct - where does that BRD0476 term come from?

We could make the full list of synonyms that backs name-resolver available for use, but if the values we want aren't in there, then ....

bill-baumgartner commented 1 year ago

So far, I've found it listed in the PubChem record.

sandrine-m commented 1 year ago

It is the public Broad name ID, MolePro knows about that compound name. @cbizon would you be interested with MolePro synonym list to incorporate into SRI or shall I just work with Bill separately?

cbizon commented 1 year ago

I looked into this - we do get some terms from pubchem, but we don't use the user-deposited synonyms. In the past we have found them way too messy and error prone, so we took them out. I'm certainly interested in the MolePro synonym list, but also it'll take a while to get that incorporated, so it makes sense in my mind to work with Bill directly.

bill-baumgartner commented 1 year ago

I share your concerns regarding user-submitted synonyms, Chris. @sandrine-m, do you know if there are synonym sources that we might trust more than others; for instance, would it be sufficient to import just the Broad IDs?

@sandrine-m - If MolePro can provide a file that maps from some canonical identifier to synonyms, then we can begin to incorporate these synonyms into our NLP pipelines. It would be nice to evaluate this change somehow, so perhaps we can think about how best to do that as well. Let me know how you'd like to proceed. Thanks.

sandrine-m commented 1 year ago

@bill-baumgartner I will double check with Vlado but I believe what we have is a curated list. It contains as well manually corrected compound structures where they were incorrectly entered. I'll keep you posted. Thank you all!

sierra-moxon commented 1 year ago

@vdancik @sandrine-m - do we have a list that @bill-baumgartner can use from MolePro?

sierra-moxon commented 1 year ago

from TAQA: list is coming in the next MolePro release.

sierra-moxon commented 1 year ago

Hi @sandrine-m - is the next release of MolePro out? :) Can we close this or pass it on?

sandrine-m commented 1 year ago

Hi @sierra-moxon I do not think so, I'll let @vdancik confirm

sstemann commented 7 months ago

i ran this query via ARS Test and received results. @sandrine-m can you take a look?

https://arax.ncats.io/?r=0781b532-5d31-4905-8da5-02b6bc20fbdf


{
  "edges": {
    "e00": {
      "object": "n01",
      "predicates": [
        "biolink:related_to"
      ],
      "subject": "n00"
    }
  },
  "nodes": {
    "n00": {
      "categories": [
        "biolink:SmallMolecule"
      ],
      "ids": [
        "CHEMBL.COMPOUND:CHEMBL1881825"
      ]
    },
    "n01": {
      "categories": [
        "biolink:NamedThing"
      ]
    }
  }
}
sierra-moxon commented 4 months ago

closing as completed; please reopen if necessary.