genome / pindel

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
GNU General Public License v3.0
162 stars 89 forks source link

wrong VAF estimation of pindel #95

Open jinxinhao0627 opened 6 years ago

jinxinhao0627 commented 6 years ago

Hi,

I am using Pindel to call indels in our samples, Our samples are tumor-normal samples, and it's target sequencing data, the library was constructed by capturing.

I used BWA mem to map the trim data with hg19 reference by default parameter. After getting the sorted bam file, I used samtools to remove duplication in the data. But samtools don't work well with duplication, It may not well removed.

And then using Pindel to call indels. Cause we have the answers of the sample, the VAF in the pindel vcf are much lower than the answers, we use the cut-off value of 1% for somatic variations, therefore many indels are filtered out by this threshold.

I just adjust the -w to 2 and turn off the SV calling. Other parameters are all by default. My questions are why this situation would happen? and how to correct the parameter to get the right VAF of indels.

The following four results are from Pindel, The VAF of this four indels are 0.08, 0.15, 0.03, 0.12. But according to our results, the VAF are 0.002, 0.009, 0.0013 and 0.0032 respectively.

chr22

chr3-1

chr3-2

chr3-3

Thank you for your time.

liangkaiye commented 6 years ago

Pindel is a variant discovery tool so that it is stringent in read selection to make a call. At the same time, it does counting ref supporting allele aggressively. A separate allele counting script could correct this.

Hi,

I am using Pindel to call indels in our samples, Our samples are tumor-normal samples, and it's target sequencing data, the library was constructed by capturing.

I used BWA mem to map the trim data with hg19 reference by default parameter. After getting the sorted bam file, I used samtools to remove duplication in the data. But samtools don't work well with duplication, It may not well removed.

And then using Pindel to call indels. Cause we have the answers of the sample, the VAF in the pindel vcf are much lower than the answers, we use the cut-off value of 1% for somatic variations, therefore many indels are filtered out by this threshold.

I just adjust the -w to 2 and turn off the SV calling. Other parameters are all by default. My questions are why this situation would happen? and how to correct the parameter to get the right VAF of indels.

The following four results are from Pindel, The VAF of this four indels are 0.08, 0.15, 0.03, 0.12. But according to our results, the VAF are 0.002, 0.009, 0.0013 and 0.0032 respectively.

Thank you for your time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jinxinhao0627 commented 6 years ago

So could this separate allele counting script available for now? If it is available, how could I get it ? Thank you very much.