fastq-multx supporting sequence in header

PASTED from forums.  

This is an important feature that fastq-mcf should handle, but currently does 
not.   Also, I noticed that Illumina outputs GAGATTCC+GGCTCTGA for dual-indexed 
files.   It's not hard to do in the code, but it is a feature that I intend to 
add.

On Saturday, June 14, 2014 6:20:32 PM UTC-4, Christopher Laumer wrote:
Can fastq-multx (or any other tool that people know of) demultiplex PE fastq 
files based on the index sequence given in the sequence *headers*, not in the 
sequence itself?

For instance consider a 100 bp fastq looking like this (with a mate in a 
different file):

@ILLUMINA-D00365:240:H9N3RADXX:2:1101:2110:2045 1:N:0:GAGATTCCGGCTCTGA
AAGCCGGTATTTAAATATCTTATTGAAAAAATAATTTTATGGTTTGTTTTATTCTTTTAAATAAAATCTTTTAAATCAAC
TCTTTTTTATTCGGCTATTT
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJIJJJJJJJJJJJJJJIJJJHJJJJJJJJJJJJJJJJJJJJJJHHHHHHFFFFF
FEEEEEEDDDDEDDDDDDDE

The index (here, two 8bp dual indices concatenated) is in the sequence name at 
the end ("1:N:0:GAGATTCCGGCTCTGA").

From all I can gather the normal behavior of fastq-multx is to look for the 
index within the sequence itself - but these are reads that have already been 
"demultiplexed" by CASAVA but using the wrong indices (so they made it into the 
"UndeterminedIndices" file... long story). 

Does anyone have any ideas how to handle this (or if fastq-multx can?). I 
really appreciate the input!

Original issue reported on code.google.com by earone...@gmail.com on 9 Jul 2014 at 2:12

Merged into: #30

ddunlap4 / ea-utils

fastq-multx supporting sequence in header #31