NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Add Taxon Filtering to UI NameRes Gene Requests to Find All the Model Organisms #310

Open sierra-moxon opened 1 year ago

sierra-moxon commented 1 year ago

Original Title

what chemical upgregulates Sox1 (Mouse) ...

Original Description

First attempt: https://ui.test.transltr.io/results?l=Sox1&t=1&q=609d41de-b2e8-485b-8769-fb98a0bdd3a4 (reload had the same issue)

Fetching current ARA status...
/creative_status:1     Failed to load resource: the server responded with a status of 502 ()
utilities.js:152 No error callback function specified.
ResultsList.js:243 Error
    at De (utilities.js:155:11)
    at ResultsList.js:208:25

re-running in a new session/tab: no results returned after 15 mins.

I think if we don't have results for an answer, we shouldn't put it in the autocomplete.
In addition, should mouse, zebrafish, etc genes show up in general in autocomplete?

dnsmith124 commented 1 year ago

@sierra-moxon Do you think this is an action item for the UI (filter out non-human genes, or things we don't have an answer for) or for the NR/NN (add a flag to the request to not send non-human genes, etc.).

Personally I'm inclined toward the latter.

gaurav commented 1 year ago

I think there's three pieces to this:

  1. Filter out non-human genes/proteins from the autocomplete -- I think the quickest way to get this working would be via the Attribute Server, which could tell the UI which taxon a particular identifier came from. If we want this information to happen in NameRes, we would need to add it to Babel (https://github.com/TranslatorSRI/Babel/issues/155).
  2. I don't think we should filter out concepts that we don't have results for (otherwise users might get frustrated when autocomplete can't find a disease they know about), but it would be nice to be able to grey those out or otherwise indicate a lack of results (maybe with some way of users to say "I'd really like information about this concept!").
  3. I believe there is a plan to collect a list of all known node identifiers from every ARA and KP so that we can identify which ones could even possibly have data in Translator. If we have that information, we could probably add that to Babel pretty quickly, and present that information in both NodeNorm and NameRes. Should we be aiming for that?

I'm going to loop in @cbizon for this thoughts as well.

andrewsu commented 1 year ago

I believe there is a plan to collect a list of all known node identifiers from every ARA and KP so that we can identify which ones could even possibly have data in Translator.

Just noting that given its federated design, BTE would not be able to feasibly provide such a list, and certainly wouldn't be able to keep it up to date.

sierra-moxon commented 1 year ago

I keep coming back to the idea that we could design simple Solr indexes for Human NCBIGene (names, synonyms, symbols only) and MONDO and HP (names and synonyms only) for use by the autocomplete for September. It would be easier to explain the terms showing up in the autocomplete to the user and we'd eliminate a lot of the noise we're seeing there as identified by the "red team testing" during the relay.

It's true that some diseases have synonyms or alternative names that aren't in MONDO for example, but adding the synonyms is best done by MONDO curators.

Does the UI have enough information in the response back from NR to know what the ID is for the autocomplete suggestion before sending it to the user?

newgene commented 1 year ago

We can probably create a quick wrapper at the attribute server specifically for the autocompletion purpose, since the underlying APIs like MyGene.info can already covers the genes queries by names, synonyms, symbols, MyDisease.info and other APIs can cover MONDO and HP (queried by names and synonyms).

Genomewide commented 1 year ago

We have discussed this and there are many mouse, rat, and other species genes that have results.
And there is no way that I know to tell if there is a result for one until it is run. Like Andrew said, it would be difficult.
We have currently limited the gene search to Entrez Gene which is not species-specific.
We do filter out species using the annotation server, but we have chosen to leave in mouse, rat, and zebrafish.

Maybe an FAQ that says - what if if don't find any results for a species other than human? Then look for the human ortholog as they are the most connected genes in the graph.

sstemann commented 1 year ago

@Genomewide looks like this didn't make the FAQ - do you want the UCWG to write some text for this?

Genomewide commented 1 year ago

@sandrine-m @rhubal can UCWG create a FAQ for this please?

cbizon commented 9 months ago

@gaurav can you give us an update on choosing the correct species gene

gaurav commented 4 months ago

only_taxa filtering is now available on Name-Lookup up to Test (e.g. https://name-lookup.test.transltr.io/lookup?string=sox1&autocomplete=true&offset=0&limit=10&only_taxa=NCBITaxon%3A9606%7CNCBITaxon%3A10090%7CNCBITaxon%3A10116%7CNCBITaxon%3A7955) and will be deploying to Prod next week. Note that we only have taxa information for genes at present, and only for genes that have at least one NCBIGene identifier associated with them.

gaurav commented 3 months ago

only_taxa filtering is now available on Name-Lookup up to Prod (e.g. https://name-lookup.transltr.io/lookup?string=sox1&autocomplete=true&offset=0&limit=10&only_taxa=NCBITaxon%3A9606%7CNCBITaxon%3A10090%7CNCBITaxon%3A10116%7CNCBITaxon%3A7955). Note that we only have taxa information for genes at present, and only for genes that have at least one NCBIGene identifier associated with them. I've let @dnsmith124 know about this, and I think he's working on figuring out how best to use this to improve autocomplete on genes.

sstemann commented 2 months ago

i'm not sure what the change is for Guppy

so i'm not sure there is anything here for Guppy to be done @dnsmith124

image

dnsmith124 commented 2 months ago

@sstemann what you've said matches my understanding as well

gaurav commented 2 months ago

@dnsmith124 I think you should add taxon filtering to your NameRes gene requests so you find all the model organisms -- but I don't know if that's something for Hammerhead or Beyond.

dnsmith124 commented 2 months ago

@gaurav how would taxon filtering at the NameRes level differ from our current method of reaching out to mygene to get taxon info after getting terms from the NameRes? Will these terms that have no results not return from NameRes if we switch to using its taxon filtering?

gaurav commented 2 months ago

@gaurav how would taxon filtering at the NameRes level differ from our current method of reaching out to mygene to get taxon info after getting terms from the NameRes? Will these terms that have no results not return from NameRes if we switch to using its taxon filtering?

The downside to taxon filtering via NameRes is that we currently ONLY do this for cliques containing NCBIGene: prefixes, because that's the only place we get taxon IDs from at the moment. But I think all the genes we're interested in have NCBIGene identifiers, so I think that should be okay. And yes, if you set a taxon filter, anything missing taxon information will be silently ignored.

The (big!) upside to taxon filtering via NameRes is that the frontend currently requests ~100 results, and then filters them using the MyGene information -- but any model organism genes results that didn't make the top 100 results are silently ignored. But with NameRes taxon filtering, you'll be guaranteed to get all the model results we know about. Now, hopefully I've improved the searching enough that all human genes should be within the top ~100 results, but without the filtering there's no way to be sure. (Unless I build some kind of "sort-human/rat/mouse-genes-first" functionality, which sounds tricky but maybe doable?)

cbizon commented 2 months ago

The other (potential) upside is that it removes one call in the UI so it should be a bit less latency. Unless, of course, you are using other info from the call to mygene.

sstemann commented 1 week ago

@dnsmith124 will there be a change in the UI for this in Hammerhead? its currently still the same as Aug 29, but with the Hammerhead backend https://ui.test.transltr.io/results?l=Sox1%20(Mouse)&i=NCBIGene:20664&t=1&r=0&q=73aab312-0e29-4f6d-914f-162cb7bc0439

dnsmith124 commented 5 days ago

@sstemann we currently do not have plans to incorporate Name Resolver's taxonomy filtering in Hammerhead