Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162
stars
90
forks
source link
Fix multi-threading bug that causes incorrect results #106
We have observed that, for a fraction of the events detected in our data files, Pindel output varies with the number of threads Pindel uses. Specifically the per-sample per-strand counts are incorrect — 350+51 = 401 = 277+124, i.e., the total number of supporting reads is unchanged, but some of the reads have been assigned to the wrong strand:
Output was unchanged for low threading settings, but starts to differ at T=7 and by T=16 is dramatically incorrect.
This is fixed (and the output no longer varies with the number of threads used) by correcting ReadBuffer::flush() to maintain the order of m_rawreads[] entries when they are copied into m_filteredReads[] regardless of threading indeterminacy.
I suspect this patch may also fix or affect #26, which appears to be a similar problem.
We have observed that, for a fraction of the events detected in our data files, Pindel output varies with the number of threads Pindel uses. Specifically the per-sample per-strand counts are incorrect — 350+51 = 401 = 277+124, i.e., the total number of supporting reads is unchanged, but some of the reads have been assigned to the wrong strand:
Output was unchanged for low threading settings, but starts to differ at T=7 and by T=16 is dramatically incorrect.
This is fixed (and the output no longer varies with the number of threads used) by correcting
ReadBuffer::flush()
to maintain the order ofm_rawreads[]
entries when they are copied intom_filteredReads[]
regardless of threading indeterminacy.I suspect this patch may also fix or affect #26, which appears to be a similar problem.