Closed dhimmel closed 7 years ago
You can query by a gene symbol directly:
http://mygene.info/v3/query?q=A1BG
or, if you want the match on official symbol only:
http://mygene.info/v3/query?q=symbol:A1BG
Also note that, by default, the query service returns only matches from human, mouse, rat (because we included every gene-coding species, returning matches for all species by default does not fit most of our users' use cases.)
You can still get the matches for all species if you want:
http://mygene.info/v3/query?q=A1BG&species=all
Or, if you want a specific species:
http://mygene.info/v3/query?q=A1BG&species=mouse
Ref: http://docs.mygene.info/en/v3/doc/query_service.html#species
@newgene : Would an autocomplete style search be expected to work on this field (or another one)?
@newgene, I'm asking about querying by gene name rather than symbol. A1BG
is a symbol. alpha-1B-glycoprotein
, HEL-S-163pA
, and epididymis secretory sperm binding protein Li 163pA
are names. Is querying by gene name supported?
Interestingly, https://mygene.info/v3/query?q=alpha-1B-glycoprotein returns:
{
"total": 1,
"took": 3,
"max_score": 25.8868,
"hits": [
{
"_id": "299963",
"_score": 25.8868,
"entrezgene": 299963,
"name": "similar to alpha 1B-glycoprotein",
"symbol": "RGD1564515",
"taxid": 10116
}
]
}
Which is missing the correct gene (entrezgene == 1
), which has an exact match as name.
@dhimmel yes, I read your post too quickly, then I realized you were asking about querying by gene name, you actually need to query like this:
http://mygene.info/v3/query?q="alpha-1-B glycoprotein"
or
http://mygene.info/v3/query?q=name:"alpha-1-B glycoprotein"
Looks like the dash in your original query made the difference.
@cgreene this might be similar to what you need:
https://bitbucket.org/sulab/mygene.autocomplete/overview
Note that you can customize the query to what you need, like this line:
"q": "(symbol:{term} OR symbol: {term}* OR name:{term}* OR alias: {term}* OR summary:{term}*)",
@newgene got it. The preferred name section of the Entrez Gene website is confusing. The primary name for A1BG is "alpha-1-B glycoprotein". For some reason, the Entrez Gene webpage contains a field for preferred names that lists "alpha-1B-glycoprotein".
So it looks like MyGene gene queries search primary names but not alternatives. This issue is a feature request to also search the alternative names available in Entrez Gene.
@newgene : What I'm really asking - is there an ngram tokenizer used for those fields? Trying to figure out if partial queries will return sensible matches. I searched for ngram_filter
and didn't find anything in the source.
I poked around in this https://github.com/SuLab/mygene.info/blob/master/src/utils/es.py a bit, but I didn't find anything obvious right off hand and thought you might know.
@cgreene I think you're asking about partial search terms. For example, does https://mygene.info/v3/query?q=alpha-1-B%20glycoprot return a superset of the results that https://mygene.info/v3/query?q=alpha-1-B%20glycoprotein returns? It appears not, but I suggest you open a new issue, since this issue is for searching by alternate names.
Good point @dhimmel. Opened #4 to focus on this.
@dhimmel I confirmed that those alternative names under "General protein information" section of NCBI A1BG are not included in current MyGene.info API. We will look into it to include them in our future release, then you should be able to return those hits using these alt. names.
@dhimmel @cgreene just want to let you guys know that we have now included those alternative names from NCBI for every gene object, under the field name "other_names":
http://mygene.info/v3/gene/1017?fields=other_names
and
http://mygene.info/v3/query?q=other_names:cyclin-dependent%20kinase
(note your original example gene 299963 has no alternative names any more from NCBI, so it currently has no other_names field)
For now, "other_names" field is not included in the unfielded query (like you pass a term directly to "q" without specifying a field), so you will need to explicitly add the field name prefix in the query. We can re-evaluate this based on user's feedback.
Thanks @newgene! Cognoma decided to go with mygene.info for this service, so you may hear some more from us 👍
@cgreene Awesome! And you should hear us soon about a feature we are putting in to allow our users better customize their queries (like your auto-suggestion use cases)
@newgene thanks! Confirming the functionality based on the original example.
https://mygene.info/v3/query?q=other_names:HEL-S-163pA is returning (as expected):
{
"total": 1,
"max_score": 12.30816,
"took": 14,
"hits": [
{
"_id": "1",
"_score": 12.30816,
"entrezgene": 1,
"name": "alpha-1-B glycoprotein",
"symbol": "A1BG",
"taxid": 9606
}
]
}
@cgreene suggested we look into mygene.info for Project Cognoma: https://github.com/cognoma/core-service/issues/29#issuecomment-252601701. My first impression is that this is a really awesome service that will help us a lot.
When I tried searching
mygene.info/v3/query
by gene name, no results were returned.By name I mean that A1BG has the following Entrez Gene information:
Preferred Names
Names
Is this feature missing because biologists usually search by symbol? It seems like there would be many situations where name search would help you identify a gene you were interested in.