biothings / mygene.py

mygene is an easy-to-use Python wrapper to access MyGene.Info services.
Other
82 stars 13 forks source link

Unable to query all taxa #8

Closed fungs closed 5 years ago

fungs commented 5 years ago

The online API does not restrict taxa by default. I cannot use the python query() function without restricting taxa to the default human, rat and mouse. I don't quite understand these settings. This is a rather useless setting when working in microbiology, for instance.

How can this setting be disabled?

Thanks!

newgene commented 5 years ago

@fungs you can always pass "species=all" to query against all species. We described this in MyGene.info API document:

http://docs.mygene.info/en/latest/doc/data.html#species

But you are right, in mygene.py documentation, we did not include this "species=all" option. Will update the documentation there shortly. Thanks!

newgene commented 5 years ago

@fungs I should mention that, by default, query method should default to all species already:

import mygene
mg = mygene.MyGeneInfo()
mg.query('cdk2')      # same results as the next line
mg.query('cdk2', species='all')      # same as the last line
mg.query('cdk2', species='human,mouse,rat')       # this restricts to human, mouse and rat only
In [10]: mygene.__version__
Out[10]: '3.0.0'

Are you using the latest mygene python package?

fungs commented 5 years ago

I was using mygene 3.0.0 and I tried with species="all" as query parameter, but it was still limiting the query. I compared with direct json results. Not sure, why I get these results.

newgene commented 5 years ago

Can you try this to see what's your debug output:

import logging
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True

res = mg.query('cdk2')

I have this output:

In [16]: res = mg.query('cdk2')
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): mygene.info
DEBUG:urllib3.connectionpool:http://mygene.info:80 "GET /v3/query?q=cdk2 HTTP/1.1" 200 383
fungs commented 5 years ago

Thanks for the helpful debugging hints! It turned out it was the size parameter limitation (default=10) and the structure of the returned object which I didn't understand properly. Otherwise, the package is behaving correctly. Still, I was looking for the corresponding code in the git master branch but could not find it (see #7).

newgene commented 5 years ago

And @fungs, I want to mention that, for microbiology studies, this feature might be useful for you:

http://mygene.info/blog/query-genes-beyond-species-at-levels-of-genus-family-phylum

It's a rather old blog post, but just change API URL from v2 to v3, the feature still works.