Closed pturnbaugh closed 10 years ago
Hi @pturnbaugh,
The quality score threshold is stated incorrectly in the help file. It is not the minimum acceptable but actually the maximum unacceptable score. The script only accepts scores the cutoff, not greater than or equal to.
Thanks for pointing this out - I'll check into it.
Update: fixed in #1245. (merged)
Where is the option for a barcode in the header? This is the format here and that used in previous generations.
Do you have headers in barcodes in fastq files? If so, could you post ~10 records worth of data here? We've never supported that format in fastq, but have in some of the other formats (e.g., see here).
Update: fixed in #1236.
Why can't we merge paired ends anymore?
This has never been supported in QIIME, but support likely will be added in QIIME 1.6.0-dev. You can find some discussion of how to do this here and here.
Update: fixed in #1216. (merged)
Sure, here's an example of our latest output format:
@ILLUMINA:371:D1G7GACXX:5:1101:1471:2066 1:N:0:GCCAATATCTCG GTTAGTTTGAGACATTGAGATTGTAGGCCAATGATGTTTCAGGATGGGGAGAGTGTATGGTAAATTGTATCGGTGGAGTT GCCAGAACCAGAAGTGGTTT + B@BFFFFFGHHHHJJIGHIJJIJHGJJJJJIJJIJIFGIIIIJFGGIJJDG@F8BFBGIJ=FEHJEIIGHHIHCDDDB;A C;ACD=;ABBDBC?>CC3<9 @ILLUMINA:371:D1G7GACXX:5:1101:1426:2075 1:N:0:CAGATCATCCCG AAAGAGCTTGGAAATCCAATGGGGTATCTATTGTTTTTGAAAAATGTACATAGTTTTCATGCCCAGCCTGCAAGCGTCCA ACTTGACCGATCACCGCCAG + @CCFFFFFHHHHHJJJJJEGIJJJ<CGHIJJJJFGIIHFGIGHIIJDGIJJJJB@FHGIIIIJIIIIGEEG9CHEBDDDB EEEDCCDDDBBBBCD@<B<B @ILLUMINA:371:D1G7GACXX:5:1101:1326:2085 1:N:0:TNNNNNNNNNNN GAAAGCCAGCGAGGCCTGGAAGGTGAAACGGGAGACGACGTTGACGAAGGTGACGAGCACGATCNCCATGANNGAGNCGN NNNCCACCNANTCTTCGAAC + CCCFFFFFHGHGDIGHIEGHDIICDHIIGIJJEHEIIGGHACHEEDFFDC>AACBBD6=B#################### #################### @ILLUMINA:371:D1G7GACXX:5:1101:1459:2096 1:N:0:TGACCAATCTCG CTCGAGAGAAACCACGGTCACGTGACCACAAACAGATTGCATTCAAACTTGGACTTGAGGAACATACGCCATTGTACGAG TACAAGGTAGGATATCTCAC + CCCFFFFFGHGHGGGG>HEGIJEGGIGIJIIJIEEHIAGHGGGHIGIJJJJJIIIJIJJJJHHHHHGFFBBCACCCDDDD ??CCACD>@CBCCCDC>ACD @ILLUMINA:371:D1G7GACXX:5:1101:1353:2100 1:N:0:TTGGGNATGTAG CTTATTGTTTCTCTGCAATGGGCTTCTACGCGATTTGACGCCTTGGATGGCGGCGCTCCAGGAGCATCTACGGACATCCT CGGAGTTGCCTCGATCTTCG + =?@DDD>2AFACFIBG@?E?E@@FGHCDFGHB:?DGGGI0DF:BA3==F@EB@@B8=C?02<?C@:4@@()5@A?> @33092(4::@@BBCBBC38 @ILLUMINA:371:D1G7GACXX:5:1101:1385:2104 1:N:0:TGACCAATCTCG GATCGCTTGGGAGATCCGCCACAGCAGATCTTCGCGCAGATTGGGCCGCAGCCGAGCCCAAGAGGTTTCCTCGAAGGCGG TGCGGGCCGCTTTGACCGCA + CBCFFFFFGHHGHJJJJJJJJJJJIIJGIBGGIJJJIJJEHJBHIEIHDDDDDDDDBDDDDDDDD4<@CCCCDB<?9ABD
BBDDDDDDD>>BDACABB9 @ILLUMINA:371:D1G7GACXX:5:1101:1410:2104 1:N:0:CGATGTATCTCG GGCTCCAAAAGAACTTGAATCGTACACGACGATTGAGGGAGAAGGACCGAAGGTCAAAGAGGGTCAGAAAGTAGCGGTCC AGTATTCGGGATGGCTGTGG + @C@FDDFDHGHHHIIIJGIGGI@GHIHIIGIFGH=?FGIHIIGBHDGIJEGBB@CAEDECDDD??ACCCCDADDDBB@BB DDCDC@ACDDBB0<CCBBCD
I'll look into process_iseq.py, we were previously using split_libraries_illumina.py with the "--barcode_in_header" flag
As for the paired ends, I didn't actually mean assembling them. In split_libraries_illumina.py (qiime 1.3.0) you could input the forward and reverse reads using the "-5" and "-3" flags, and the results would be the 2 pairs joined together. Will this be added back to "split_libraries_fastq.py" in QIIME 1.6?
I generally run split_libraries_fastq.py
twice: once for the 5' reads and once for the 3' reads, and you'll get an output fasta file for each. In the future we'll add stitching of overlapping reads together.
Okay, process_iseq.py works, but would require us to convert from fastq -> iseq -> fastq. Would it be possible to add support for our fastq format (above), according to our core facility it is the latest Illumina default format.
We can work on getting that in place, but it definitely won't be in the QIIME 1.6.0 release (we're too close now). Note that the sequencing center should be able to output the barcodes in a separate file - the sequencing centers that we work with are doing that.
the latest Illumina default format
If they would only stop changing their default!!!
Ok, sounds good!
Definitely interested this topic as well. I'm a developer with the Qiime website and we're starting to see a number of barcoded fastq files come through. Currently I'm attempting to strip the barcodes into a separate file and running those as a pair.
@douginator2000, is the format that you're seeing the same as the one that @pturnbaugh pasted above? If you're seeing others as well can you paste an example of ~3 fastq records on this thread? Thanks!
One possible way to handle this, if we are seeing the barcodes in different places in the fastq (header v beginning of reads) would be to have, e.g., a process_barcoded_fastq.py
script (I believe@walterst has been working on something like this, which isn't in QIIME), that generates the paired barcode/read fastq files that split_libraries_fastq.py
expects. This would be similar to the current process_iseq.py
and process_qseq.py
, which is how we've handled the different formats in the past.
If the barcodes are currently always in the same place and likely to stay that way (which seems unlikely given Illumina's history of lack of standards in data delivery) then this is over-complicated. But if they're likely to continue to move around then this is a much better solution relative to integrating all of this in the split_libraries_fastq.py
interface as there is great benefit in keeping that interface simple and understandable.
For our part we have no plan to move our barcodes- the plan to have them integrated into the front of the read ala 454 has served us well, only changing the length, and some experiments to deal with the diversity problem in amplicon on the Illumina platform. Is this still in process for the 1.6.0dev cycle or are we looking at 1.7.0? Still in favor of something that can parse the first n bp of a read (According to the mapping file with barcode and linkerprimer) and split the libraries accordingly.
We have to move this to the 1.7.0-dev cycle, but this is high priority.