biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

suggest_from symbol query matches fields other than symbol #5

Closed dhimmel closed 7 years ago

dhimmel commented 7 years ago

https://mygene.info/v3/query?q=GABA*&suggest_from=symbol^10&species=human&entrezonly=true intends to partial search by gene symbol for the term GABA, so all results should have an official symbol that starts with GABA.

However, the following payload is returned:

{
  "total": 40,
  "max_score": 1.55,
  "took": 6,
  "hits": [
    {
      "_id": "4942",
      "_score": 1.55,
      "entrezgene": 4942,
      "name": "ornithine aminotransferase",
      "symbol": "OAT",
      "taxid": 9606
    },
    {
      "_id": "2554",
      "_score": 1.55,
      "entrezgene": 2554,
      "name": "gamma-aminobutyric acid type A receptor alpha1 subunit",
      "symbol": "GABRA1",
      "taxid": 9606
    },
    {
      "_id": "9568",
      "_score": 1.55,
      "entrezgene": 9568,
      "name": "gamma-aminobutyric acid type B receptor subunit 2",
      "symbol": "GABBR2",
      "taxid": 9606
    },
    {
      "_id": "11345",
      "_score": 1.55,
      "entrezgene": 11345,
      "name": "GABA type A receptor associated protein like 2",
      "symbol": "GABARAPL2",
      "taxid": 9606
    },
    {
      "_id": "6529",
      "_score": 1.55,
      "entrezgene": 6529,
      "name": "solute carrier family 6 member 1",
      "symbol": "SLC6A1",
      "taxid": 9606
    },
    {
      "_id": "11337",
      "_score": 1.55,
      "entrezgene": 11337,
      "name": "GABA type A receptor-associated protein",
      "symbol": "GABARAP",
      "taxid": 9606
    },
    {
      "_id": "23710",
      "_score": 1.55,
      "entrezgene": 23710,
      "name": "GABA type A receptor associated protein like 1",
      "symbol": "GABARAPL1",
      "taxid": 9606
    },
    {
      "_id": "7915",
      "_score": 1.55,
      "entrezgene": 7915,
      "name": "aldehyde dehydrogenase 5 family member A1",
      "symbol": "ALDH5A1",
      "taxid": 9606
    },
    {
      "_id": "223",
      "_score": 1.55,
      "entrezgene": 223,
      "name": "aldehyde dehydrogenase 9 family member A1",
      "symbol": "ALDH9A1",
      "taxid": 9606
    },
    {
      "_id": "2566",
      "_score": 1.55,
      "entrezgene": 2566,
      "name": "gamma-aminobutyric acid type A receptor gamma2 subunit",
      "symbol": "GABRG2",
      "taxid": 9606
    }
  ]
}

It looks like only three of the hits actually match the symbol field. Is this a bug or am I misunderstanding the effect of the query?

Also is there documentation of suggest_from and all of its options?

newgene commented 7 years ago

@dhimmel this suggestion_from parameter is just something I suggested earlier how we might implement, but never actually implemented. We are currently implementing a more flexible solution to allow users to define a custom query for those relatively complicated query cases cannot be easily formulated using the existing query syntax. We will let you know once it's in-place (expecting one or two weeks).

For the particular example above, you can actually use this query right now:

https://mygene.info/v3/query?q=symbol:GABA*^10&species=human&entrezonly=true

dhimmel commented 7 years ago

this suggestion_from parameter is just something I suggested earlier how we might implement, but never actually implemented.

My bad!