Closed wmercurio closed 8 years ago
The raw data have been updated to correct this issue, and new links are now available in dataset-metadata.tsv
The index reads lack the barcode sequence in the fastq header. The following command is required prior to calling join_paired_ends.py and/or split_libraries_fastq.py
gunzip -c mock-index-read.fastq.gz | paste - - - - | awk '{print $1" "$2$3 ; print $3; print $4; print $5}' | gzip -c > mock-index-read.corrected.fastq.gz
Thank you, @mdeleeuw. I will add your comments here and on this issue to the README.md pages for these datasets. I have added these README.md files, located in the home directory for each dataset, to list notes on each dataset and share tips for using the data. Your input has helped guide the creation of these files.
For mock-5, there is an issue with the files contained in the raw-data-url: ftp://ftp.microbio.me/pub/illumina-mock-communities-raw-data/Broad3/
There are three files there: combined.read1, combined.read2, and combined.i
The combined.i file is double zipped, unlike the rest, and it has full reads, not barcodes. I am not positive, but it looks like it could be a combination of read1 and read2 in one file.