caporaso-lab / mockrobiota

A public resource for microbiome bioinformatics benchmarking using artificially constructed (i.e., mock) communities.
http://mockrobiota.caporasolab.us
BSD 3-Clause "New" or "Revised" License
77 stars 35 forks source link

split_libraries_fastq.py error with golay barcodes (mock-5 & -7) #57

Closed fwhelan closed 7 years ago

fwhelan commented 7 years ago

This must be an issue specific to me since no one else seems to have run into anything similar here. I don't have much experience with golay barcodes, so please mind my ignorance!

I have downloaded mock-5 and corrected the barcode headers as recommended in README.md. When I run split libs, I run into the following error:

mock-5$ split_libraries_fastq.py -i mock-forward-read.fastq.gz -o split_libraries -m sample-metadata.tsv -b mock-index-read.corrected.fastq.gz --rev_comp_barcode
Error in split_libraries_fastq.py: Some or all barcodes are not valid golay codes. Do they need to be reverse complemented? If these are not golay barcodes pass --barcode_type 12 to disable barcode error correction, or pass --barcode_type # if the barcodes are not 12 base pairs, where # is the size of the barcodes. Invalid codes:
        AATCAACTAGGC CAAATGGTCGTC ACACATAAGTCG TGTACGGATAAC

If you need help with QIIME, see:
http://help.qiime.org

Similarly, I run into this error with mock-7:

mock-7$split_libraries_fastq.py -i mock-forward-read.fastq -o split_libraries -m sample-metadata.tsv -b mock-index-read.fastq --rev_comp_barcode
Error in split_libraries_fastq.py: Some or all barcodes are not valid golay codes. Do they need to be reverse complemented? If these are not golay barcodes pass --barcode_type 12 to disable barcode error correction, or pass --barcode_type # if the barcodes are not 12 base pairs, where # is the size of the barcodes. Invalid codes:
        CAACGCTAGAAT CCATCACATAGG GGCTAAACTATG

If you need help with QIIME, see:
http://help.qiime.org

I'm running Qiime 1.9.1 on Ubutnu 14.04.

Could anyone advice as to what I'm missing? Thank you!

nbokulich commented 7 years ago

I apologize for this issue — I think the recommendations in README.md may be based on outdated information for both of these datasets. Could you try:

split_libraries_fastq.py -i mock-forward-read.fastq.gz -o split_libraries -m sample-metadata.tsv -b mock-index-read.fastq.gz --rev_comp_mapping_barcodes

and see if that does the trick?

fwhelan commented 7 years ago

Thanks for your quick response! For mock-5, I get the following:

mock-5$ split_libraries_fastq.py -i mock-forward-read.fastq.gz -o split_libraries -m sample-metadata.tsv -b mock-index-read.corrected.fastq.gz --rev_comp_mapping_barcodes
Traceback (most recent call last):
  File "/usr/local/bin/split_libraries_fastq.py", line 365, in <module>
    main()
  File "/usr/local/bin/split_libraries_fastq.py", line 344, in main
    for fasta_header, sequence, quality, seq_id in seq_generator:
  File "/usr/local/lib/python2.7/dist-packages/qiime/split_libraries_fastq.py", line 322, in process_fastq_single_end_read_file
    raise FastqParseError("Headers of barcode and read do not match. Can't continue. "
qiime.split_libraries_fastq.FastqParseError: Headers of barcode and read do not match. Can't continue. Confirm that the barcode fastq and read fastq that you are passing match one another.
nbokulich commented 7 years ago

This issue is my fault — I recently uploaded the fixed files per the recommendation in README.md and forgot to remove this note. The unmodified files (mock-index-read.fastq.gz) should work without error.

I have been making a number of fixes to data files recently and am currently working on a few updates, including to README.md files, so I apologize if you run into any more issues — just let me know and I will fix asap. Thanks!

fwhelan commented 7 years ago

That worked! ..for mock-5. To be honest, I'm looking for a working set, so I can stick with mock-3-5. But for the purposes of furthering your project:

mock-7$ split_libraries_fastq.py -i mock-forward-read.fastq -o split_libraries -m sample-metadata.edit.tsv -b mock-index-read.fastq --rev_comp_mapping_barcodes
Traceback (most recent call last):
  File "/usr/local/bin/split_libraries_fastq.py", line 365, in <module>
    main()
  File "/usr/local/bin/split_libraries_fastq.py", line 344, in main
    for fasta_header, sequence, quality, seq_id in seq_generator:
  File "/usr/local/lib/python2.7/dist-packages/qiime/split_libraries_fastq.py", line 317, in process_fastq_single_end_read_file
    parse_fastq(fastq_read_f, strict=False, phred_offset=phred_offset)):
  File "/usr/local/lib/python2.7/dist-packages/skbio/parse/sequences/fastq.py", line 174, in parse_fastq
    seqid)
skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: ILLUMINA_0275:2:1101:1357:1952#ATAGGCGATCNN. This may be because you passed an incorrect value for phred_offset.

Thanks so much for your help!

nbokulich commented 7 years ago

Thanks for confirming! And thanks for the error report on mock-7.

Please let me know if you run into any additional issues.

gregcaporaso commented 7 years ago

@nbokulich, can this issue be closed now?

nbokulich commented 7 years ago

@gregcaporaso I am waiting for PR #58 to merge, which includes related edits to READMEs. Thanks!

nbokulich commented 7 years ago

The README files have been updated with #58. Thanks @fwhelan for catching this!