galaxyproteomics / tools-galaxyp

Galaxy Tool Shed repositories maintained and developed by the GalaxyP community
MIT License
34 stars 57 forks source link

Metaproteomic public repositories (from Protein database downloader) #87

Open PratikDJagtap opened 7 years ago

PratikDJagtap commented 7 years ago

Links to metaproteomic public repositories

IDEA: It would be a good idea to add a few common publically available metaproteomics databases (see HOMD database for an example below) to the Protein database downloader tool in Galaxy (https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/dbbuilder).

Please suggest links for some useful metaproteomics databases.

HOMD database: ftp://ftp.homd.org/HOMD_annotated_genomes_archive/oral_microbiome_dynamic.aa.zip

bgruening commented 7 years ago

General question, does anyone has good connections to those databases? It would be great to get a tighter integration with Galaxy via data_sources, for this we need help from the database developers.

jhervey4 commented 7 years ago

Hello: iMicrobe hosts quite a bit of environmental data as well as the CAMERA reference sequences (funded by the Moore Foundation years ago): ftp://ftp.imicrobe.us/ & imicrobe.us.

Hope this helps!

stuppie commented 7 years ago

Human Microbiome Project Reference Genomes from GI http://downloads.hmpdacc.org/data/reference_genomes/body_sites/Gastrointestinal_tract.pep.fsa HMP Metagenome ftp://public-ftp.hmpdacc.org/HMGI/stool/

MetaHit http://www.nature.com/nature/journal/v464/n7285/full/nature08821.html http://www.bork.embl.de/~arumugam/Qin_et_al_2010/

BGI http://gutmeta.genomics.org.cn/

PratikDJagtap commented 7 years ago

Some of the databases that can be downloaded through the "Metaproteomics database downloader" tool (Protein database downloader tool in Galaxy (https://github.com/galaxyproteomics/tools-galaxyp/tree/master/tools/dbbuilder) would be (from some of the suggestions above):

HMP airways: http://downloads.hmpdacc.org/data/reference_genomes/body_sites/Airways.pep.fsa HMP Blood: http://downloads.hmpdacc.org/data/reference_genomes/body_sites/Blood.pep.fsa HMP Gastro-intestinal tract: http://downloads.hmpdacc.org/data/reference_genomes/body_sites/Gastrointestinal_tract.pep.fsa HMP Oral: http://downloads.hmpdacc.org/data/reference_genomes/body_sites/Oral.pep.fsa HMP Skin: http://downloads.hmpdacc.org/data/reference_genomes/body_sites/Skin.pep.fsa HMP urogenital Tract: http://downloads.hmpdacc.org/data/reference_genomes/body_sites/Urogenital_tract.pep.fsa HMP All Body Sites: http://downloads.hmpdacc.org/data/reference_genomes/all_pep_20141006.tar.gz

ftp://public.genomics.org.cn/BGI/gutmeta/UniSet/UniGene.pep.gz http://www.bork.embl.de/~arumugam/Qin_et_al_2010/frequent_microbe_proteins.fasta.gz

Unfortunately, http://www.uniprot.org/help/unimes has been retired. However, if we can access information from https://www.ebi.ac.uk/metagenomics/ and generate customized databases using WGS data for SixGill, Omega2 and 16S rRNA sequencing data using the proposed UniProt API that would be great !

Ideas from @alessandrotanca, @ Prof Rudney, Carolin Kolmeder, Tim Griffin, @jhervey4 @stuppie @bgruening @jj-umn would be useful on what interface would be best to download data that can be used as an input for further utlization / analysis.

ckolm commented 7 years ago

There has been an extension of the data by Qin et al. 2010 in 2014 by Li et al. (http://www.nature.com/nbt/journal/v32/n8/full/nbt.2942.html; behind this link are ENA numbers from where to derive the data). And just an anecdotal note: The unimes was anyway a not very deep collection of data.

alessandrotanca commented 7 years ago

We also have a good feedback regarding the human gut microbiota public database mentioned by Carolin. Another interesting public database, but based on the mouse gut microbiota, can be found here: http://gigadb.org/dataset/view/id/100114/token/mZlMYJIF04LshpgP

ckolm commented 7 years ago

It might already be on galaxy but something useful would be a tool for downloading genomes from ncbi (genbank). Some tool can be found here https://github.com/kblin/ncbi-genome-download . For earlier work we have had in-house python scripts to download a set of genomes (human intestine related microbes; the list can be found in Kolmeder et al. PLoS one 11(4):e0153294, Table S3) from ncbi.