SciLifeLab / bcbb

Useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
10 stars 11 forks source link

Demultiplexing mixed length dual index runs #291

Closed remiolsen closed 10 years ago

remiolsen commented 10 years ago

If a run is dual indexed with differing index lengths, the basemask might be borked for the shortest index. Example 12bp and 16bp samplesheet:

FCID,Lane,SampleID,SampleRef,Index,Description,Control,Recipe,Operator,SampleProject
H7CG1ADXX,1,P961_1001,hg19,ACAGAT-ATCAGC,J__Doe_14_01,N,,BA,J__Doe_14_01
H7CG1ADXX,1,P961_1002,hg19,GTATGA-ATCAGC,J__Doe_14_01,N,,BA,J__Doe_14_01
...
H7CG1ADXX,1,P961_1009,hg19,TAAGGCGA-TAGATCGC,J__Doe_14_01,N,,BA,J__Doe_14_01

Runinfo.xml:

    <Reads>
      <Read Number="1" NumCycles="51" IsIndexedRead="N" />
      <Read Number="2" NumCycles="8" IsIndexedRead="Y" />
      <Read Number="3" NumCycles="8" IsIndexedRead="Y" />
    </Reads>

bcl2fastq will be run like this for the 12bp samplesheet:

/usr/local/bin/configureBclToFastq.pl (...) --use-bases-mask Y51,I8,I4N4

The correct basemask is:

Y51,I6N2,I6N2
mariogiov commented 10 years ago

I wish I could comment on a specific line in your comment to +1 the use of borked