caporaso-lab / mockrobiota

A public resource for microbiome bioinformatics benchmarking using artificially constructed (i.e., mock) communities.
http://mockrobiota.caporasolab.us
BSD 3-Clause "New" or "Revised" License
77 stars 35 forks source link

Mock-3 data #32

Closed shiffer1 closed 8 years ago

shiffer1 commented 8 years ago

There is something a miss in the header files: I attempted to demultiplex and ran into errors: The first error: skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: A0A3V120410:1:10:10065:26809/1. This may be because you passed an incorrect value for phred_offset.

I forced the Phred Offset to be 33 after taking a quick look at the file and ran again:

The 2nd error: qiime.split_libraries_fastq.FastqParseError: Headers of barcode and read do not match. Can't continue. Confirm that the barcode fastq and read fastq that you are passing match one another.

We ran the entire data set through skbio and it did not fail. Walking through one at a time revealed the headers to be identical. Several folks looked at this and were perplexed.

nbokulich commented 8 years ago

The issue here appears to be that the original headers use an unconventional nomenclature. Instead of ending in ":1" or ":2" to indicate lane number, these end in "/1" or "/2". Correcting this should solve the issue.

Question is: should we make this change and re-deposit all raw data or leave things as-is since this is an easy fix? @gregcaporaso @shiffer1 thoughts?

shiffer1 commented 8 years ago

Let me look at these I may be able to fix them like I did the others. Thanks, Arron

On Mon, Aug 29, 2016 at 1:18 PM, nbokulich notifications@github.com wrote:

The issue here appears to be that the original headers use an unconventional nomenclature. Instead of ending in ":1" or ":2" to indicate lane number, these end in "/1" or "/2". Correcting this should solve the issue.

Question is: should we make this change and re-deposit all raw data or leave things as-is since this is an easy fix? @gregcaporaso https://github.com/gregcaporaso @shiffer1 https://github.com/shiffer1 thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caporaso-lab/mockrobiota/issues/32#issuecomment-243242625, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuoUkdQc8w0Yg2kxfXWFVOspvttFKTSks5qkz6pgaJpZM4JPKVF .

shiffer1 commented 8 years ago

I'll run these through that fix and see if it solves the problem. Arron

On Mon, Aug 29, 2016 at 1:50 PM, Arron Shiffer shiffy35@gmail.com wrote:

Let me look at these I may be able to fix them like I did the others. Thanks, Arron

On Mon, Aug 29, 2016 at 1:18 PM, nbokulich notifications@github.com wrote:

The issue here appears to be that the original headers use an unconventional nomenclature. Instead of ending in ":1" or ":2" to indicate lane number, these end in "/1" or "/2". Correcting this should solve the issue.

Question is: should we make this change and re-deposit all raw data or leave things as-is since this is an easy fix? @gregcaporaso https://github.com/gregcaporaso @shiffer1 https://github.com/shiffer1 thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/caporaso-lab/mockrobiota/issues/32#issuecomment-243242625, or mute the thread https://github.com/notifications/unsubscribe-auth/AJuoUkdQc8w0Yg2kxfXWFVOspvttFKTSks5qkz6pgaJpZM4JPKVF .

nbokulich commented 8 years ago

Correction: the "normal" header format should end in " 1:N:0:1" or " 2:N:0:1" to indicate lane number. Note the leading space. Check out other sequencing runs to compare the usual format in case the examples I checked are out of date. Thanks @shiffer1 !

nbokulich commented 8 years ago

The raw data have been updated to correct this issue, and new links are now available in dataset-metadata.tsv.