gjospin / PhyloSift

Phylogenetic and taxonomic analysis for genomes and metagenomes
82 stars 18 forks source link

bio dataset analysis for paper: Human microbiome #190

Closed koadman closed 12 years ago

koadman commented 12 years ago

Jonathan suggested using the best-understood microbial ecosystem for the paper. We could analyze enterotypes, and at the very least should analyze the HMP mock communities. HMP mock community data is available here:

http://www.ncbi.nlm.nih.gov/bioproject/48475

Once the reads have been downloaded in SRA format it will be necessary to convert them to fastq using the sra toolkit. This software is already on edhar and can be used to get a fastq like so:

/home/koadman/software/sratoolkit.2.1.10-ubuntu32/bin/fastq-dump --split-spot poopdna.sra

where poopdna.sra is the name of the sra file downloaded from ncbi.

We should pick some other shotgun metagenomes from the set available here: http://www.hmpdacc.org/resources/data_browser.php and analyze.

jonathaneisen commented 12 years ago

I would suggest considering data sets that have been beaten into submission too like the Sargasso Sea or Ed Delong's depth series ...

On Apr 28, 2012, at 2:41 PM, Aaron Darling wrote:

Jonathan suggested using the best-understood microbial ecosystem for the paper. We could analyze enterotypes, and at the very least should analyze the HMP mock communities. HMP mock community data is available here:

http://www.ncbi.nlm.nih.gov/bioproject/48475

Once the reads have been downloaded in SRA format it will be necessary to convert them to fastq using the sra toolkit. This software is already on edhar and can be used to get a fastq like so:

/home/koadman/software/sratoolkit.2.1.10-ubuntu32/bin/fastq-dump --split-spot poopdna.sra

where poopdna.sra is the name of the sra file downloaded from ncbi.

We should pick some other shotgun metagenomes from the set available here: http://www.hmpdacc.org/resources/data_browser.php and analyze.


Reply to this email directly or view it on GitHub: https://github.com/gjospin/PhyloSift/issues/190

hollybik commented 12 years ago

Got HMP mock data and GOS data downloaded to Edhar, FTP commands to access SRA data as follows

454 HMP Mock even sample

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX030/SRX030841/SRR072233/SRR072233.lite.sra

454 HMP Mock staggered sample

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX030/SRX030842/SRR072232/SRR072232.lite.sra

Illumina HMP Mock even sample

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX055/SRX055380/SRR172902/SRR172902.lite.sra

Illumina HMP Mock staggered sample

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX055/SRX055381/SRR172903/SRR172903.lite.sra

Venter GOS Data: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA13694

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX026/SRX026986/SRR066138/SRR066138.lite.sra

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX026/SRX026987/SRR066139/SRR066139.lite.sra

koadman commented 12 years ago

I downloaded all 111 of the 454 datasets from the recent human microbiome paper to edhar. I propose we use these to show off phylosift's community structure comparison skillz.

The trouble with using GOS is that demonstrating an ability to draw inference from oodles of sanger data is effectively irrelevant. Nobody can do that kind of sequencing anymore. I even have reservations about 454 data being obsolete but IonTorrent data is similar enough that I think it's still relevant in the near term. I appreciate that we want to use a well-understood ecosystem and for that GOS is great, hopefully human gut is an ok compromise here.

hollybik commented 12 years ago

Data located and ready to run - going to close this issue