PASTED from forums.
This is an important feature that fastq-mcf should handle, but currently does
not. Also, I noticed that Illumina outputs GAGATTCC+GGCTCTGA for dual-indexed
files. It's not hard to do in the code, but it is a feature that I intend to
add.
On Saturday, June 14, 2014 6:20:32 PM UTC-4, Christopher Laumer wrote:
Can fastq-multx (or any other tool that people know of) demultiplex PE fastq
files based on the index sequence given in the sequence *headers*, not in the
sequence itself?
For instance consider a 100 bp fastq looking like this (with a mate in a
different file):
@ILLUMINA-D00365:240:H9N3RADXX:2:1101:2110:2045 1:N:0:GAGATTCCGGCTCTGA
AAGCCGGTATTTAAATATCTTATTGAAAAAATAATTTTATGGTTTGTTTTATTCTTTTAAATAAAATCTTTTAAATCAAC
TCTTTTTTATTCGGCTATTT
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJIJJJJJJJJJJJJJJIJJJHJJJJJJJJJJJJJJJJJJJJJJHHHHHHFFFFF
FEEEEEEDDDDEDDDDDDDE
The index (here, two 8bp dual indices concatenated) is in the sequence name at
the end ("1:N:0:GAGATTCCGGCTCTGA").
From all I can gather the normal behavior of fastq-multx is to look for the
index within the sequence itself - but these are reads that have already been
"demultiplexed" by CASAVA but using the wrong indices (so they made it into the
"UndeterminedIndices" file... long story).
Does anyone have any ideas how to handle this (or if fastq-multx can?). I
really appreciate the input!
Original issue reported on code.google.com by earone...@gmail.com on 9 Jul 2014 at 2:12
Original issue reported on code.google.com by
earone...@gmail.com
on 9 Jul 2014 at 2:12