NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Gene `tp53` returns human gene, but not mouse gene #470

Open gaurav opened 1 year ago

gaurav commented 1 year ago

This is because the human gene has a preferred label of tp53, while the mouse gene has a preferred label of trp53. It is therefore being sorted below all the other genes that have a preferred label of tp53. I'll try to turn off preferred label boosting and see if that fixes this issue without causing problems with other searches.

Reported by @Genomewide

gaurav commented 1 year ago

This doesn't seem to be fixed by just turning off preferred label boosting, and it doesn't help that the mouse gene doesn't currently include tp53 as a synonym:

{
  "NCBIGene:7157": [
    "TRP53",
    "tumor suppressor p53",
    "TP53 gene",
    "gene p53",
    "p53 tumor suppressor",
    "p53 oncogene",
    "Genes, p53",
    "LFS1",
    "p53 Genes",
    "p53 tumor suppressor gene",
    "TP A 053 GENES",
    "LFS1",
    "p53",
    "TP53 Gene",
    "TP53",
    "Gene, p53",
    "Gene, TP53",
    "p53",
    "P A 053 GENES",
    "Li-Fraumeni syndrome",
    "GENES TP 053",
    "TP53 Genes",
    "TUMOR PROTEIN p53",
    "P53",
    "p53 Gene",
    "Li-Fraumeni syndrome",
    "p53 gene",
    "Tumor Protein P53 (Li-Fraumeni Syndrome) Gene",
    "tumor protein p53",
    "Genes, TP53",
    "GENES P 053",
    "TRANSFORMATION-RELATED PROTEIN 53",
    "tp53"
  ],
  "NCBIGene:22059": [
    "Trp53 (Mmus)",
    "Trp53"
  ]
}

So, possible solutions:

  1. Look for more synonyms on the mouse genes (possibly by conflating with proteins)
  2. Use information from identical synonyms to connect concepts (e.g. note that both of these genes have the same synonym, "TRP53" -- maybe we could use that to recommend alternate searches or combine results)

I'll take this up again after the Relay.

Genomewide commented 1 year ago

I think it would be interesting to conflate the protein IDs with these. Definitely worth discussing.
I think that if the mouse gene does not have an alias that not showing it is probably right. So, I am not so concerned that it does not. But expanding the search for the genes would be worth while.

gaurav commented 1 month ago
  1. Actually, conflating with proteins won't help in this case, since that doesn't pull in trp53 or (apparently) any mouse genes -- only human genes.
  2. NameLookup now supports filtering by taxa on genes -- so you can ask for genes named "tp53" for humans, mice, rats and zebrafish -- however, trp52 (Mmus) is the 86th item on that list, and I don't think anybody will have the patience to scroll that far down the list.

I don't think this is a super high priority, so I'm going to schedule it for Hammerhead. But if anybody has any idea why tp53 has a different name in mice, or knows of a database that includes "tp53" as a synonym for trp53 (Mmus), please let me know!