biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

Include more species in the database? #85

Closed dongbohu closed 3 years ago

dongbohu commented 4 years ago

When querying the genes by GO term like this: http://mygene.info/v3/query?q=GO:0006595&limit=50

The matched genes are associated with 9 taxids (9606, 10090, 10116, 4896, 9031, 7955, 352472, 559292, 6239). Is it possible to include genes of other species, such as genes whose taxid is 208964 (Pseudomonas aeruginosa)? Thanks.

newgene commented 4 years ago

@dongbohu All species with gene info are supported in MyGene.info. Those nine common species are the default, but you can pass the list of taxid to the species parameter to limit your query to a different set of species (e.g. species=208964). You can also pass species=all to include all species.

More details are here:

https://docs.mygene.info/en/latest/doc/data.html#species

dongbohu commented 4 years ago

@newgene Thank you very much for your reply. I tried this query: http://mygene.info/v3/query?q=GO:0006595&species=208964 and it returns nothing. But when I check the file that I downloaded from Gene Ontology website: ftp://ftp.geneontology.org/go/gene-associations/gene_association.pseudocap.gz (last updated on 06/02/2020), I find two genes associated with this GO id (PA0321 and aphA). Is it because the database in mygene.info hasn't been updated?

kevinxin90 commented 4 years ago

@dongbohu You're right. We did exclude certain species when loading the GO data into MyGene.info. We will add that soon. Will keep you updated!

dongbohu commented 4 years ago

Thank you @kevinxin90

dongbohu commented 3 years ago

@kevinxin90 Any progress on this issue? Thanks.

kevinxin90 commented 3 years ago

@dongbohu Hi Dongbu, I'm sorry for the delay. The last couple of weeks have been very busy for the team. When I double checked on our data plugin for Gene Ontology in MyGene.info, it seems the parser is fine, we don't miss any species from the data source. However, the reason you don't see this pseudocap species is that it's not included in the source file. (FYI, we're downloading from the NCBIGene FTP site: https://ftp.ncbi.nih.gov/gene/DATA/ using the gene2go.gz file). And I did a grep on that file and found the species of your interest is not included there. So to help you as our valued user, I specifically create a pending API here: https://pending.biothings.io/pseudocap_go. It's an API that serves on top of the file you provided. The API shares the same query syntax as MyGene.info. Some example queries are https://pending.biothings.io/pseudocap_go/gene/883119 and https://pending.biothings.io/pseudocap_go/query?q=go.id:GO\:0052621. Feel free to try it out and let us know if it works for your case. (The _id is the Entrez id, same as mygene.info) And for the future plan, we do plan to integrate the GO annotation for this species into MyGene as soon as NCBIGene (our data source) provides that.

Thanks again!

dongbohu commented 3 years ago

@kevinxin90 Thank you for your reply. The pending API is very helpful. Are you going to keep it permanently or it is temporary?

dongbohu commented 3 years ago

Another question: Is it possible that you can add genes of Saccharomyces cerevisiae (aka. yeast, Taxonomy ID 4932) to mygene.info?

dongbohu commented 3 years ago

Never mind. I realized that taxid 4932 is supported by mygene.info, but it is not included when searching by GO term. Is it also because this species is not included in https://ftp.ncbi.nih.gov/gene/DATA/gene2go.gz? Can you add this species to the pending API too?

kevinxin90 commented 3 years ago

@dongbohu Regarding the first question for the pending API, that will be permanent.

For the second question, I did a grep on that file from NCBIGene (which we use to ingest data to MyGene.info) and 4932 is not one of the species included there. And I was looking at this ftp site you provided earlier. I didn't see yeast info included as well? (Did I miss it?) If you could point me to the data source for download, we sure can help quickly set up a pending API for you. Thanks!

dongbohu commented 3 years ago

@kevinxin90 I went over the files in ftp://ftp.geneontology.org/go/gene-associations/ and NCBI website. It turns out that most genes whose tax_ids are 4932 have been reassigned to tax_id 559292, which is already supported in mygene.info, so you guys don't need to do anything.

Thank you very much for your help.

kevinxin90 commented 3 years ago

@dongbohu Sounds good! So I will close this issue now. But in case you need additional support for GO in other species, feel free to reopen the ticket and let us know.

dongbohu commented 3 years ago

Thank you!