mock-5 data - Githubissues

caporaso-lab / mockrobiota

A public resource for microbiome bioinformatics benchmarking using artificially constructed (i.e., mock) communities.

http://mockrobiota.caporasolab.us

BSD 3-Clause "New" or "Revised" License

77 stars 35 forks source link

mock-5 data #30

Closed wmercurio closed 8 years ago

wmercurio commented 8 years ago

For mock-5, there is an issue with the files contained in the raw-data-url: ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Broad3/

There are three files there: combined.read1, combined.read2, and combined.i

The combined.i file is double zipped, unlike the rest, and it has full reads, not barcodes. I am not positive, but it looks like it could be a combination of read1 and read2 in one file.

nbokulich commented 8 years ago

The raw data have been updated to correct this issue, and new links are now available in dataset-metadata.tsv

mdeleeuw commented 7 years ago

The index reads lack the barcode sequence in the fastq header. The following command is required prior to calling join_paired_ends.py and/or split_libraries_fastq.py

gunzip -c mock-index-read.fastq.gz | paste - - - - | awk '{print $1" "$2$3 ; print $3; print $4; print $5}' | gzip -c > mock-index-read.corrected.fastq.gz

nbokulich commented 7 years ago

Thank you, @mdeleeuw. I will add your comments here and on this issue to the README.md pages for these datasets. I have added these README.md files, located in the home directory for each dataset, to list notes on each dataset and share tips for using the data. Your input has helped guide the creation of these files.