Closed dongbohu closed 3 years ago
@dongbohu All species with gene info are supported in MyGene.info. Those nine common species are the default, but you can pass the list of taxid to the species
parameter to limit your query to a different set of species (e.g. species=208964
). You can also pass species=all
to include all species.
More details are here:
@newgene Thank you very much for your reply. I tried this query:
http://mygene.info/v3/query?q=GO:0006595&species=208964
and it returns nothing. But when I check the file that I downloaded from Gene Ontology website:
ftp://ftp.geneontology.org/go/gene-associations/gene_association.pseudocap.gz
(last updated on 06/02/2020), I find two genes associated with this GO id (PA0321
and aphA
). Is it because the database in mygene.info hasn't been updated?
@dongbohu You're right. We did exclude certain species when loading the GO data into MyGene.info. We will add that soon. Will keep you updated!
Thank you @kevinxin90
@kevinxin90 Any progress on this issue? Thanks.
@dongbohu Hi Dongbu, I'm sorry for the delay. The last couple of weeks have been very busy for the team. When I double checked on our data plugin for Gene Ontology in MyGene.info, it seems the parser is fine, we don't miss any species from the data source. However, the reason you don't see this pseudocap species is that it's not included in the source file. (FYI, we're downloading from the NCBIGene FTP site: https://ftp.ncbi.nih.gov/gene/DATA/ using the gene2go.gz file). And I did a grep on that file and found the species of your interest is not included there. So to help you as our valued user, I specifically create a pending API here: https://pending.biothings.io/pseudocap_go. It's an API that serves on top of the file you provided. The API shares the same query syntax as MyGene.info. Some example queries are https://pending.biothings.io/pseudocap_go/gene/883119 and https://pending.biothings.io/pseudocap_go/query?q=go.id:GO\:0052621. Feel free to try it out and let us know if it works for your case. (The _id is the Entrez id, same as mygene.info) And for the future plan, we do plan to integrate the GO annotation for this species into MyGene as soon as NCBIGene (our data source) provides that.
Thanks again!
@kevinxin90 Thank you for your reply. The pending API is very helpful. Are you going to keep it permanently or it is temporary?
Another question: Is it possible that you can add genes of Saccharomyces cerevisiae
(aka. yeast
, Taxonomy ID 4932
) to mygene.info
?
Never mind. I realized that taxid 4932
is supported by mygene.info, but it is not included when searching by GO term. Is it also because this species is not included in https://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz
? Can you add this species to the pending API too?
@dongbohu Regarding the first question for the pending API, that will be permanent.
For the second question, I did a grep on that file from NCBIGene (which we use to ingest data to MyGene.info) and 4932 is not one of the species included there. And I was looking at this ftp site you provided earlier. I didn't see yeast info included as well? (Did I miss it?) If you could point me to the data source for download, we sure can help quickly set up a pending API for you. Thanks!
@kevinxin90 I went over the files in ftp://ftp.geneontology.org/go/gene-associations/
and NCBI website. It turns out that most genes whose tax_id
s are 4932 have been reassigned to tax_id 559292
, which is already supported in mygene.info, so you guys don't need to do anything.
Thank you very much for your help.
@dongbohu Sounds good! So I will close this issue now. But in case you need additional support for GO in other species, feel free to reopen the ticket and let us know.
Thank you!
When querying the genes by GO term like this: http://mygene.info/v3/query?q=GO:0006595&limit=50
The matched genes are associated with 9
taxid
s (9606, 10090, 10116, 4896, 9031, 7955, 352472, 559292, 6239). Is it possible to include genes of other species, such as genes whosetaxid
is208964
(Pseudomonas aeruginosa)? Thanks.