SciLifeLab / bcbb

Useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
10 stars 11 forks source link

Single sample on lane - wrong read count #296

Closed remiolsen closed 10 years ago

remiolsen commented 10 years ago

Unforseen consequence (a.k.a bug) of demuxing indices as reads: Casava will use the indexes when calculating read counts and Q values in the file '{fc_dir}/Unaligned/BasecallStats{fc_id}/Demultiplex_Stats.htm'

So a dual-indexed sample will get 2x amount reads, and single-indexed will get 1.5x amount.

Possible solution is to first demultiplex the indices (eg. base mask N,Y,Y,N) then demultiplex the reads (Y,N,N,Y)

b97pla commented 10 years ago

@remiolsen well spotted, I didn't think of this. I think your suggestion is very good and is actually a demultiplexing strategy we should implement for all runs in order to get better quality statistics on the index reads. We will need to consider the naming scheme and how to handle these results a bit though.

On Thu, May 8, 2014 at 11:15 AM, remiolsen notifications@github.com wrote:

Unforseen consequence of demuxing indices as reads: Casava will use the indexes when calculating read counts and Q values in the file '{fc_dir}/Unaligned/BasecallStats{fc_id}/Demultiplex_Stats.htm'

So a dual-indexed sample will get 2x amount reads, and single-indexed will get 1.5x amount.

Possible solution is to first demultiplex the indices (eg. base mask N,Y,Y,N) then demultiplex the reads (Y,N,N,Y)

— Reply to this email directly or view it on GitHubhttps://github.com/SciLifeLab/bcbb/issues/296 .


Pontus Larsson, PhD Science For Life Laboratory www.scilifelab.se

Email: Pontus.Larsson@scilifelab.se Phone: +46 8 5248 1440

Mobile: +46 76 946 9788