NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

MVP2 Input restricted to biolink:ChemicalEntity - cannot select MolecularMixture #374

Open sandrine-m opened 1 year ago

sandrine-m commented 1 year ago

This issue is related to issue #149 when retesting using Ferumoxytol/Feraheme. We cannot search for this compound anymore.

image

sandrine-m commented 1 year ago

I checked the SRI Name Resolver and the node for Feraheme is still present (unassigned Gaurav, sorry for the spam :D). @dnsmith124 is this a UI issue?

sierra-moxon commented 1 year ago

in ticket #109

dnsmith124 commented 1 year ago

@sandrine-m The autocomplete bar in prod is still returning non mondo terms, whereas the one on CI has been updated to only return mondos.

When I search for "ferahe" on CI 5 terms are returned, but they're all UMLS so they're thrown out, which is why no results are shown on CI!

sierra-moxon commented 1 year ago

Although it does seem like a bit of a strange behavior for an autocomplete on a google-like search box (where the user typically encounters an autocomplete searching over 'everything'), I think the act of selecting the MVP to query as a first step, and because we have a lot of names in NameResolver from UMLS and MeSH that we want to exclude, we might need to change the requirements of the autocomplete for each MVP?

One idea is: For autocomplete that should complete on 'gene' names/symbols/synonyms, we should limit the autocomplete suggestions to ids that are prefixed with NCBIGene, Ensembl, or HGNC(taken from top three biolink:Gene id_prefixes). Its not clear to me if we should or should not autocomplete on non-human gene names/symbols/synonyms because when I try to search for non-human genes, the query takes forever and no results are returned. If we make a group decision that we should only allow searches for Human genes, then we could limit this further to HGNC ids only.

For autocomplete that should complete on 'chemical' names/symbols/synonyms, we should limit the autocomplete suggestions to ids that are prefixed with one of the following:

      PUBCHEM.COMPOUND
      CHEMBL.COMPOUND
      UNII
      CHEBI

(taken from biolink:SmallMolecule id_prefixes)

For autocomplete that should complete on 'disease' or 'phenotype' names/symbols/synonyms, we should limit the autocomplete suggestions to ids that are prefixed with MONDO or HP. This list is more restrictive than biolink in this instance.

sandrine-m commented 1 year ago

I agree with @sierra-moxon . In this particular case Feraheme has a ChEMBL.COMPOUND identifier so should show up in the autocomplete. See the ARAX synonym info for Feraheme.

edeutsch commented 1 year ago

Note that the "ARAX synonym" hyperlink above is to ARAX production, which is very old and that version of the Node Synonymizer was a bit too over-eager in merging concepts.

Our latest version in CI is less over-eager/more limited: https://arax.ci.transltr.io/?term=Feraheme

sandrine-m commented 1 year ago

Thank you @edeutsch ! I'll test using the CI version of ARAX. As a user, I found the "ARAX synonym" hyperlink extreamly useful, mostly when I cannot find compound names in the UI search field by one of its name I know.

gaurav commented 1 year ago

We have both Ferumoxytol and Feraheme in NameRes now, but only as UMLS identifiers, which suggests that the actual PubChem/other identifiers are missing or not cliqued possible. I'll look into seeing if I can figure out what the actual clique should be (if anybody knows which identifiers we expect here, that would be very helpful!)

sandrine-m commented 1 year ago

@gaurav Thanks so much! Here is what is known of MolePro on ferumoxytol (the ChEMBL id is listed in the shared json file: ChEMBL:CHEMBL1201867). Here is the json for Feraheme.

sierra-moxon commented 1 year ago

@dnsmith124 @gaurav - retesting on CI today. hox1a_screen_recording.mov.zip brca1_screen_recording.mov.zip

sstemann commented 1 year ago

@gaurav adding another drug that isnt in Node Norm https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1201823/

Name: ABATACEPT

there's also missing genes - in Test I can use MVP2 > What chemicals upregulates SIRT1, in CI SIRT1 does not return image

sierra-moxon commented 1 year ago

from TAQA: Gaurav and Andy: filtering by SmallMolecule is the issue here; we can broaden this filter in the UI. We are doing some testing here. Andy will do some more testing with the API to change settings to filter out the bad results that came back from doing the filter expansion. Sandrine: RENCI DEV is doing much better now. Chris B: In the future maybe we should pre-conflate the name resolver cliques to solve this problem.

Genomewide commented 1 year ago

The chemical search is painful to work out what the right parameters are.
Originally, we limited it to 'smallMolecule' to get rid of the answers that were '1mg asprin', '2 mg asprin' etc.

Digging further into this, there are still some drugs that don't show up even if we allow the broader search of 'chemicalentity'

Sampled the top 50 grossing drugs and removed 1 : n = 49 Searching for the brand name:

Summary I have not identified a set of parameters that keep out the drugs with 'mg' in the canonical name while also returning the max number of drugs actually in the system, and doing it without duplicates.

Fingers crossed for drug conflation!

sandrine-m commented 1 year ago

tagging here @vdancik for future follow up on this issue regarding the Ferumoxytol/feraheme use case (I retested the au tocomplete today on CI but I am still unable to query the compound)

gaurav commented 4 months ago

Update: we talked a bit about expanding the types for the chemical autocomplete at the last Relay, and our main concern is that we'll need to do some testing to make sure we don't include duplicates or crappy results. This would fix ferumoxytol, which still can't be searched for, but that's because it's a biolink:SmallMolecule: PUBCHEM.COMPOUND:6432052 "ferumoxytol" is the first result on NameLookup, but this is classified as a biolink:MolecularMixture. I think we might be able to get this done by Guppy.

We have some improvements to chemical names in the latest NameLookup (currently in Prod), but the way in which we do drug conflation prefers the drug name instead of the brand name (which is what @Genomewide found last year as well). I think that is what we want (see e.g. https://github.com/NCATSTranslator/Feedback/issues/461), but if not, there are some things we could do (in increasing order of difficulty):

  1. Use the Annotator Service to pull up the brand name of the drug.
  2. Try to identify the best brand name in the list of synonyms (possibly just by preferring DrugCentral or RXCUI labels, but more likely we'll have to dig deeper into RXCUI to find the brand name) and include that either instead of or in addition to the Drug name.
  3. Try to engineer conflations such that every drug shows up at least twice: a biolink:ChemicalEntity term that refers to the drug name (e.g. "acetaminophen") and a biolink:Drug term that refers to a brand name (e.g. "Tylenol"), then make sure that they conflate together with drug-chemical conflation turned on.

We should be able to come up with a plan for what exactly we want here by Guppy, but I'm not sure how much beyond that it will take to implement the technical solution to the problem that we come up with.

sstemann commented 2 months ago

was there a change that is included in Guppy? I can't tell what it would be, and it seems to behave the same as it ever was in CI

image

gaurav commented 2 months ago

I wasn't able to meet up with UI to come up with a plan during this sprint, but I was able to extend NameRes' type filtering so that you can now search for multiple Biolink types at once, which would allow UI to allow e.g. biolink:Drug or biolink:SmallMolecule or biolink:MolecularMixture. So we can at least play around with options for broadening this without messing up around with the cliques or anything. I'll see if I can organize a meeting with UI over the next Translator Relay to think about options for fixing this fully in Hammerhead.

sstemann commented 1 week ago

@gaurav it looks like the same behavior in Hammerhead in Test. @dnsmith124 are there changes expected from the UI to address this in the UI's Hammerhead release?

image

image

dnsmith124 commented 1 week ago

@sstemann we do not currently have plans to make any changes here for Hammerhead, though we can shift our priorities to make a change if we need to.