Open sierra-moxon opened 1 year ago
@sierra-moxon Do you think this is an action item for the UI (filter out non-human genes, or things we don't have an answer for) or for the NR/NN (add a flag to the request to not send non-human genes, etc.).
Personally I'm inclined toward the latter.
I think there's three pieces to this:
I'm going to loop in @cbizon for this thoughts as well.
I believe there is a plan to collect a list of all known node identifiers from every ARA and KP so that we can identify which ones could even possibly have data in Translator.
Just noting that given its federated design, BTE would not be able to feasibly provide such a list, and certainly wouldn't be able to keep it up to date.
I keep coming back to the idea that we could design simple Solr indexes for Human NCBIGene (names, synonyms, symbols only) and MONDO and HP (names and synonyms only) for use by the autocomplete for September. It would be easier to explain the terms showing up in the autocomplete to the user and we'd eliminate a lot of the noise we're seeing there as identified by the "red team testing" during the relay.
It's true that some diseases have synonyms or alternative names that aren't in MONDO for example, but adding the synonyms is best done by MONDO curators.
Does the UI have enough information in the response back from NR to know what the ID is for the autocomplete suggestion before sending it to the user?
We can probably create a quick wrapper at the attribute server specifically for the autocompletion purpose, since the underlying APIs like MyGene.info can already covers the genes queries by names, synonyms, symbols, MyDisease.info and other APIs can cover MONDO and HP (queried by names and synonyms).
We have discussed this and there are many mouse, rat, and other species genes that have results.
And there is no way that I know to tell if there is a result for one until it is run. Like Andrew said, it would be difficult.
We have currently limited the gene search to Entrez Gene which is not species-specific.
We do filter out species using the annotation server, but we have chosen to leave in mouse, rat, and zebrafish.
Maybe an FAQ that says - what if if don't find any results for a species other than human? Then look for the human ortholog as they are the most connected genes in the graph.
@Genomewide looks like this didn't make the FAQ - do you want the UCWG to write some text for this?
@sandrine-m @rhubal can UCWG create a FAQ for this please?
@gaurav can you give us an update on choosing the correct species gene
only_taxa
filtering is now available on Name-Lookup up to Test (e.g. https://name-lookup.test.transltr.io/lookup?string=sox1&autocomplete=true&offset=0&limit=10&only_taxa=NCBITaxon%3A9606%7CNCBITaxon%3A10090%7CNCBITaxon%3A10116%7CNCBITaxon%3A7955) and will be deploying to Prod next week. Note that we only have taxa information for genes at present, and only for genes that have at least one NCBIGene identifier associated with them.
only_taxa
filtering is now available on Name-Lookup up to Prod (e.g. https://name-lookup.transltr.io/lookup?string=sox1&autocomplete=true&offset=0&limit=10&only_taxa=NCBITaxon%3A9606%7CNCBITaxon%3A10090%7CNCBITaxon%3A10116%7CNCBITaxon%3A7955). Note that we only have taxa information for genes at present, and only for genes that have at least one NCBIGene identifier associated with them. I've let @dnsmith124 know about this, and I think he's working on figuring out how best to use this to improve autocomplete on genes.
i'm not sure what the change is for Guppy
so i'm not sure there is anything here for Guppy to be done @dnsmith124
@sstemann what you've said matches my understanding as well
@dnsmith124 I think you should add taxon filtering to your NameRes gene requests so you find all the model organisms -- but I don't know if that's something for Hammerhead or Beyond.
@gaurav how would taxon filtering at the NameRes level differ from our current method of reaching out to mygene to get taxon info after getting terms from the NameRes? Will these terms that have no results not return from NameRes if we switch to using its taxon filtering?
@gaurav how would taxon filtering at the NameRes level differ from our current method of reaching out to mygene to get taxon info after getting terms from the NameRes? Will these terms that have no results not return from NameRes if we switch to using its taxon filtering?
The downside to taxon filtering via NameRes is that we currently ONLY do this for cliques containing NCBIGene: prefixes, because that's the only place we get taxon IDs from at the moment. But I think all the genes we're interested in have NCBIGene identifiers, so I think that should be okay. And yes, if you set a taxon filter, anything missing taxon information will be silently ignored.
The (big!) upside to taxon filtering via NameRes is that the frontend currently requests ~100 results, and then filters them using the MyGene information -- but any model organism genes results that didn't make the top 100 results are silently ignored. But with NameRes taxon filtering, you'll be guaranteed to get all the model results we know about. Now, hopefully I've improved the searching enough that all human genes should be within the top ~100 results, but without the filtering there's no way to be sure. (Unless I build some kind of "sort-human/rat/mouse-genes-first" functionality, which sounds tricky but maybe doable?)
The other (potential) upside is that it removes one call in the UI so it should be a bit less latency. Unless, of course, you are using other info from the call to mygene.
@dnsmith124 will there be a change in the UI for this in Hammerhead? its currently still the same as Aug 29, but with the Hammerhead backend https://ui.test.transltr.io/results?l=Sox1%20(Mouse)&i=NCBIGene:20664&t=1&r=0&q=73aab312-0e29-4f6d-914f-162cb7bc0439
@sstemann we currently do not have plans to incorporate Name Resolver's taxonomy filtering in Hammerhead
Original Title
what chemical upgregulates Sox1 (Mouse) ...
Original Description
First attempt: https://ui.test.transltr.io/results?l=Sox1&t=1&q=609d41de-b2e8-485b-8769-fb98a0bdd3a4 (reload had the same issue)
re-running in a new session/tab: no results returned after 15 mins.
I think if we don't have results for an answer, we shouldn't put it in the autocomplete.
In addition, should mouse, zebrafish, etc genes show up in general in autocomplete?