caporaso-lab / mockrobiota

A public resource for microbiome bioinformatics benchmarking using artificially constructed (i.e., mock) communities.
http://mockrobiota.caporasolab.us
BSD 3-Clause "New" or "Revised" License
77 stars 35 forks source link

Failing to demultiplex mock-3 #61

Closed benjjneb closed 7 years ago

benjjneb commented 7 years ago

I am trying to use QIIME 1.9.1 to demultiplex mock-3. When I enter the command listed in the readme (split_libraries_fastq.py -i mock-forward-read.fastq.gz -o split_libraries -m sample-metadata.tsv -b mock-index-read.fastq.gz) I get the following error:

MacQIIME PHPMB13:mock3 $ split_libraries_fastq.py -i mock-forward-read.fastq.gz -o split_libraries -m sample-metadata.tsv -b mock-index-read.fastq.gz
Error in split_libraries_fastq.py: Some or all barcodes are not valid golay codes. Do they need to be reverse complemented? If these are not golay barcodes pass --barcode_type 12 to disable barcode error correction, or pass --barcode_type # if the barcodes are not 12 base pairs, where # is the size of the barcodes. Invalid codes:
    AATCAACTAGGC CAAATGGTCGTC ACACATAAGTCG TGTACGGATAAC

If you need help with QIIME, see:
http://help.qiime.org

When I muck around with the revcomp options, I can get past that, but get stuck here:

MacQIIME PHPMB13:mock3 $ split_libraries_fastq.py -i mock-forward-read.fastq -b mock-index-read.fastq -o out --store_demultiplexed_fastq -m sample-metadata.tsv  --rev_comp_mapping_barcodes
Traceback (most recent call last):
  File "/macqiime/anaconda/bin/split_libraries_fastq.py", line 365, in <module>
    main()
  File "/macqiime/anaconda/bin/split_libraries_fastq.py", line 344, in main
    for fasta_header, sequence, quality, seq_id in seq_generator:
  File "/macqiime/anaconda/lib/python2.7/site-packages/qiime/split_libraries_fastq.py", line 317, in process_fastq_single_end_read_file
    parse_fastq(fastq_read_f, strict=False, phred_offset=phred_offset)):
  File "/macqiime/anaconda/lib/python2.7/site-packages/skbio/parse/sequences/fastq.py", line 174, in parse_fastq
    seqid)
skbio.parse.sequences._exception.FastqParseError: Failed qual conversion for seq id: A0A3V120410:1:10:10065:26809/1. This may be because you passed an incorrect value for phred_offset.

I've tried specifying phred_offsets of 33 and 64, but that doesn't help.

nbokulich commented 7 years ago

Thanks for raising this issue, @benjjneb . I got some advice from @wasade , who has processed mock-3 with qiime1 more recently (I have been using qiime2 without issue on this dataset). He notes that:

it looks like we needed to rc both mapping and the index read
and specify --phred_offset 33
it looks like we also needed to use the simplified headers, where the " 1:N:0:1" bits were removed

@benjjneb if you can confirm that this works for you, and provide a working qiime1 command to me, I will correct this in the mock-3 README and close this issue. Thanks!

benjjneb commented 7 years ago

Thanks, can confirm the following worked:

split_libraries_fastq.py -i mock-forward-read.fastq -m sample-metadata.tsv -o out --store_demultiplexed_fastq -b mock-index-read.fastq --rev_comp_mapping_barcodes --rev_comp_barcode --phred_offset 33
nbokulich commented 7 years ago

Thanks @benjjneb and @wasade for testing this! The corrected readme is fixed with #62 .