biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

Incomplete search results for uncharacterized pig genes #19

Closed cmungall closed 6 years ago

cmungall commented 6 years ago

E.g.

mygene.info/v3/query?q=A0A075B7H6 mygene.info/v3/query?q=ENSSSCG00000030825

Don't return results

newgene commented 6 years ago

@cmungall this is related to the default species parameter setting in mygene.info API.

MyGene.info APIs support a "species" parameter to filter the returned genes by species. For the query endpoint (/v3/query), it has the default value of "human,mouse,rat". That's why the above query does not return you anything. But adding "species=pig" (or simply "species=all") should give what you want:

http://mygene.info/v3/query?q=A0A075B7H6&species=pig http://mygene.info/v3/query?q=ENSSSCG00000030825&species=pig

There is a debate whether we should just set the default "species" to "all", so that your above queries will work without passing "species". The initial reason for the default of "human,mouse,rat" is just to avoid returning too many matched genes from all species (e.g. ?q=cdk2&species=all). We think (at least at the time when we made that decision) that is not what most of our users want, and "human,mouse,rat" are still the most commonly used default species for our users. But we like to hear from our users, and can change the default behavior if users want the other way.

Ref: http://docs.mygene.info/en/latest/doc/query_service.html#id8

cmungall commented 6 years ago

got it, didn't RTFM closely enough.

The initial reason for the default of "human,mouse,rat" is just to avoid returning too many matched genes from all species (e.g. ?q=cdk2&species=all)

One approach would be to page results, but boost the favored species to the top of the list (there is nothing worse than people having to go to the Nth page of results to get to the first human gene - I know because we've accidentally implemented things that way before!)

andrewsu commented 6 years ago

As I mentioned to @newgene a few minutes ago, I vote in favor of changing that default behavior to search all species (while boosting human and common model organisms, which I think we might already do...) I think it made sense at one point, but no longer...

newgene commented 6 years ago

@cmungall yes, we are actually doing that already, like human>mouse>rat>other species in the order of the returned hits. For a query like q=cdk2, symbol match>name match, etc. This probably another reason we should switch to "species=all" by default now. I'm also in favor of this change now.

newgene commented 6 years ago

We have now switched to "species=all" as the default in our recent release:

http://biothings.io/new-default-behavior-for-species-parameter/