biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

Mouse genes cannot be queried by ensembl ID #139

Closed cconow closed 9 months ago

cconow commented 10 months ago

querymany(["Eef2", "ENSMUSG00000034994", "ENSG00000167658"], species="mouse,human", scopes="_id,symbol,ensemblgene", fields="ensembl,symbol,genomic_pos", returnall=True)

Returns

{
  "out": [
    {
      "query": "Eef2",
      "_id": "1938",
      "_score": 17.811184,
      "ensembl": {
        "gene": "ENSG00000167658",
        "protein": [
          "ENSP00000307940",
          "ENSP00000471265"
        ],
        "transcript": [
          "ENST00000309311",
          "ENST00000594885",
          "ENST00000596417",
          "ENST00000598182",
          "ENST00000598436",
          "ENST00000600720",
          "ENST00000600794"
        ],
        "translation": [
          {
            "protein": "ENSP00000471265",
            "rna": "ENST00000600794"
          },
          {
            "protein": "ENSP00000307940",
            "rna": "ENST00000309311"
          }
        ],
        "type_of_gene": "protein_coding"
      },
      "genomic_pos": {
        "chr": "19",
        "end": 3985463,
        "ensemblgene": "ENSG00000167658",
        "start": 3976056,
        "strand": -1
      },
      "symbol": "EEF2"
    },
    {
      "query": "Eef2",
      "_id": "13629",
      "_score": 14.938413,
      "genomic_pos": {
        "chr": "10",
        "end": 81018332,
        "ensemblgene": "ENSMUSG00000034994",
        "start": 81012465,
        "strand": 1
      },
      "symbol": "Eef2"
    },
    {
      "query": "ENSMUSG00000034994",
      "notfound": true
    },
    {
      "query": "ENSG00000167658",
      "_id": "1938",
      "_score": 26.22843,
      "ensembl": {
        "gene": "ENSG00000167658",
        "protein": [
          "ENSP00000307940",
          "ENSP00000471265"
        ],
        "transcript": [
          "ENST00000309311",
          "ENST00000594885",
          "ENST00000596417",
          "ENST00000598182",
          "ENST00000598436",
          "ENST00000600720",
          "ENST00000600794"
        ],
        "translation": [
          {
            "protein": "ENSP00000471265",
            "rna": "ENST00000600794"
          },
          {
            "protein": "ENSP00000307940",
            "rna": "ENST00000309311"
          }
        ],
        "type_of_gene": "protein_coding"
      },
      "genomic_pos": {
        "chr": "19",
        "end": 3985463,
        "ensemblgene": "ENSG00000167658",
        "start": 3976056,
        "strand": -1
      },
      "symbol": "EEF2"
    }
  ],
  "dup": [
    [
      "Eef2",
      2
    ]
  ],
  "missing": [
    "ENSMUSG00000034994"
  ]
}

Which shows that querying human for eef2 returns information in the ensembl field, while mouse does not. Additionally, searching by the ID works for human but not for mouse. This is consistent across all genes I have tried. Interestingly, genomic_pos does still show the ensembl ID for mouse.

newgene commented 10 months ago

@cconow thanks for letting us know. I can confirm this issue from this mouse gene record:

https://mygene.info/v3/gene/13629?fields=ensembl

returns empty, which is supposed to return a matching ensembl field (e.g in this gene)

We had a similar issue for other genes too when updating data from the latest Ensembl v110 release recently, and we deployed a fix last week, but looks like there are still genes missing this fix. We will have a closer look and hope to fix them very soon.

jal347 commented 9 months ago

I have deployed the fix to the Ensembl v110 release. Let me know if you have any problems. The links that were an issue are working now.

https://mygene.info/v3/gene/13629?fields=ensembl

https://mygene.info/v3/gene/ENSMUSG00000034994

newgene commented 9 months ago

I also want to add a note that we are adding additional data tests to cover more species like mouse, rat etc., in addition to the human genes we covered in our current test suite. This issue we had seems only impact mouse genes, not human genes, so our current test procedure did not catch it before the data release.