Closed koadman closed 12 years ago
I would suggest considering data sets that have been beaten into submission too like the Sargasso Sea or Ed Delong's depth series ...
On Apr 28, 2012, at 2:41 PM, Aaron Darling wrote:
Jonathan suggested using the best-understood microbial ecosystem for the paper. We could analyze enterotypes, and at the very least should analyze the HMP mock communities. HMP mock community data is available here:
http://www.ncbi.nlm.nih.gov/bioproject/48475
Once the reads have been downloaded in SRA format it will be necessary to convert them to fastq using the sra toolkit. This software is already on edhar and can be used to get a fastq like so:
/home/koadman/software/sratoolkit.2.1.10-ubuntu32/bin/fastq-dump --split-spot poopdna.sra
where poopdna.sra is the name of the sra file downloaded from ncbi.
We should pick some other shotgun metagenomes from the set available here: http://www.hmpdacc.org/resources/data_browser.php and analyze.
Reply to this email directly or view it on GitHub: https://github.com/gjospin/PhyloSift/issues/190
Got HMP mock data and GOS data downloaded to Edhar, FTP commands to access SRA data as follows
454 HMP Mock even sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX030/SRX030841/SRR072233/SRR072233.lite.sra
454 HMP Mock staggered sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX030/SRX030842/SRR072232/SRR072232.lite.sra
Illumina HMP Mock even sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX055/SRX055380/SRR172902/SRR172902.lite.sra
Illumina HMP Mock staggered sample
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX055/SRX055381/SRR172903/SRR172903.lite.sra
Venter GOS Data: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA13694
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX026/SRX026986/SRR066138/SRR066138.lite.sra
wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByExp/litesra/SRX/SRX026/SRX026987/SRR066139/SRR066139.lite.sra
I downloaded all 111 of the 454 datasets from the recent human microbiome paper to edhar. I propose we use these to show off phylosift's community structure comparison skillz.
The trouble with using GOS is that demonstrating an ability to draw inference from oodles of sanger data is effectively irrelevant. Nobody can do that kind of sequencing anymore. I even have reservations about 454 data being obsolete but IonTorrent data is similar enough that I think it's still relevant in the near term. I appreciate that we want to use a well-understood ecosystem and for that GOS is great, hopefully human gut is an ok compromise here.
Data located and ready to run - going to close this issue
Jonathan suggested using the best-understood microbial ecosystem for the paper. We could analyze enterotypes, and at the very least should analyze the HMP mock communities. HMP mock community data is available here:
http://www.ncbi.nlm.nih.gov/bioproject/48475
Once the reads have been downloaded in SRA format it will be necessary to convert them to fastq using the sra toolkit. This software is already on edhar and can be used to get a fastq like so:
/home/koadman/software/sratoolkit.2.1.10-ubuntu32/bin/fastq-dump --split-spot poopdna.sra
where poopdna.sra is the name of the sra file downloaded from ncbi.
We should pick some other shotgun metagenomes from the set available here: http://www.hmpdacc.org/resources/data_browser.php and analyze.