Open sandrine-m opened 1 year ago
I checked the SRI Name Resolver and the node for Feraheme is still present (unassigned Gaurav, sorry for the spam :D). @dnsmith124 is this a UI issue?
in ticket #109
@sandrine-m The autocomplete bar in prod is still returning non mondo terms, whereas the one on CI has been updated to only return mondos.
When I search for "ferahe" on CI 5 terms are returned, but they're all UMLS so they're thrown out, which is why no results are shown on CI!
Although it does seem like a bit of a strange behavior for an autocomplete on a google-like search box (where the user typically encounters an autocomplete searching over 'everything'), I think the act of selecting the MVP to query as a first step, and because we have a lot of names in NameResolver from UMLS and MeSH that we want to exclude, we might need to change the requirements of the autocomplete for each MVP?
One idea is:
For autocomplete that should complete on 'gene' names/symbols/synonyms, we should limit the autocomplete suggestions to ids that are prefixed with NCBIGene
, Ensembl
, or HGNC
(taken from top three biolink:Gene
id_prefixes
). Its not clear to me if we should or should not autocomplete on non-human gene names/symbols/synonyms because when I try to search for non-human genes, the query takes forever and no results are returned. If we make a group decision that we should only allow searches for Human genes, then we could limit this further to HGNC
ids only.
For autocomplete that should complete on 'chemical' names/symbols/synonyms, we should limit the autocomplete suggestions to ids that are prefixed with one of the following:
PUBCHEM.COMPOUND
CHEMBL.COMPOUND
UNII
CHEBI
(taken from biolink:SmallMolecule
id_prefixes
)
For autocomplete that should complete on 'disease' or 'phenotype' names/symbols/synonyms, we should limit the autocomplete suggestions to ids that are prefixed with MONDO
or HP
. This list is more restrictive than biolink in this instance.
I agree with @sierra-moxon . In this particular case Feraheme has a ChEMBL.COMPOUND identifier so should show up in the autocomplete. See the ARAX synonym info for Feraheme.
Note that the "ARAX synonym" hyperlink above is to ARAX production, which is very old and that version of the Node Synonymizer was a bit too over-eager in merging concepts.
Our latest version in CI is less over-eager/more limited: https://arax.ci.transltr.io/?term=Feraheme
Thank you @edeutsch ! I'll test using the CI version of ARAX. As a user, I found the "ARAX synonym" hyperlink extreamly useful, mostly when I cannot find compound names in the UI search field by one of its name I know.
We have both Ferumoxytol and Feraheme in NameRes now, but only as UMLS identifiers, which suggests that the actual PubChem/other identifiers are missing or not cliqued possible. I'll look into seeing if I can figure out what the actual clique should be (if anybody knows which identifiers we expect here, that would be very helpful!)
@dnsmith124 @gaurav - retesting on CI today. hox1a_screen_recording.mov.zip brca1_screen_recording.mov.zip
@gaurav adding another drug that isnt in Node Norm https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1201823/
Name: ABATACEPT
there's also missing genes - in Test I can use MVP2 > What chemicals upregulates SIRT1, in CI SIRT1 does not return
from TAQA: Gaurav and Andy: filtering by SmallMolecule is the issue here; we can broaden this filter in the UI. We are doing some testing here. Andy will do some more testing with the API to change settings to filter out the bad results that came back from doing the filter expansion. Sandrine: RENCI DEV is doing much better now. Chris B: In the future maybe we should pre-conflate the name resolver cliques to solve this problem.
The chemical search is painful to work out what the right parameters are.
Originally, we limited it to 'smallMolecule' to get rid of the answers that were '1mg asprin', '2 mg asprin' etc.
Digging further into this, there are still some drugs that don't show up even if we allow the broader search of 'chemicalentity'
Sampled the top 50 grossing drugs and removed 1 : n = 49 Searching for the brand name:
Summary I have not identified a set of parameters that keep out the drugs with 'mg' in the canonical name while also returning the max number of drugs actually in the system, and doing it without duplicates.
Fingers crossed for drug conflation!
tagging here @vdancik for future follow up on this issue regarding the Ferumoxytol/feraheme use case (I retested the au tocomplete today on CI but I am still unable to query the compound)
Update: we talked a bit about expanding the types for the chemical autocomplete at the last Relay, and our main concern is that we'll need to do some testing to make sure we don't include duplicates or crappy results. This would fix ferumoxytol, which still can't be searched for, but that's because it's a biolink:SmallMolecule: PUBCHEM.COMPOUND:6432052 "ferumoxytol" is the first result on NameLookup, but this is classified as a biolink:MolecularMixture. I think we might be able to get this done by Guppy.
We have some improvements to chemical names in the latest NameLookup (currently in Prod), but the way in which we do drug conflation prefers the drug name instead of the brand name (which is what @Genomewide found last year as well). I think that is what we want (see e.g. https://github.com/NCATSTranslator/Feedback/issues/461), but if not, there are some things we could do (in increasing order of difficulty):
We should be able to come up with a plan for what exactly we want here by Guppy, but I'm not sure how much beyond that it will take to implement the technical solution to the problem that we come up with.
was there a change that is included in Guppy? I can't tell what it would be, and it seems to behave the same as it ever was in CI
I wasn't able to meet up with UI to come up with a plan during this sprint, but I was able to extend NameRes' type filtering so that you can now search for multiple Biolink types at once, which would allow UI to allow e.g. biolink:Drug or biolink:SmallMolecule or biolink:MolecularMixture. So we can at least play around with options for broadening this without messing up around with the cliques or anything. I'll see if I can organize a meeting with UI over the next Translator Relay to think about options for fixing this fully in Hammerhead.
@gaurav it looks like the same behavior in Hammerhead in Test. @dnsmith124 are there changes expected from the UI to address this in the UI's Hammerhead release?
@sstemann we do not currently have plans to make any changes here for Hammerhead, though we can shift our priorities to make a change if we need to.
This issue is related to issue #149 when retesting using Ferumoxytol/Feraheme. We cannot search for this compound anymore.