Illumina / Pisces

Somatic and germline variant caller for amplicon data. Recommended caller for tumor-only workflows.
GNU General Public License v3.0
91 stars 16 forks source link

strand bias for targeted amplicon sequencing #15

Open ShannonDaddy opened 6 years ago

ShannonDaddy commented 6 years ago

Hi, I found that for targeted amplicon sequencing data, the mutations called by Pisces 5.2.7.47 will mostly be flagged as 'SB'. Is it a property of amplicon sequencing data or primer specificity? how strand bias is calculated? And I checked the attached .vcf and .ReadStrandBias.txt result files and I don't understand why the following two variants are flagged different for strand bias, one is NaN, the other is -100.0000: chr7 55241707 . G A 0 q30;SB DP=70614 GT:GQ:AD:DP:VF:NL:SB 0/1:0:68933,1677:70614:0.02375:10:NaN chr12 25398266 . G A 100 PASS DP=42537 GT:GQ:AD:DP:VF:NL:SB 0/1:100:36721,5816:42537:0.13673:10:-100.0000

and what's "ReadType_0 ReadType_1 ReadType_2" in .ReadStrandBias.txt ?

Can you help me on this? Thank you very much! results.zip

tamsen commented 6 years ago

Hi there,

Sorry for the late response. I've been doing a lot of travelling, and will be on the road for the next few months. We have not found 5.2.7.47 to have any significant differences to SB calculations compared to other versions. You can always look at your variant in IGV and see if there is indeed stand bias (look at the fwd and reverse counts).

To answer your question re SB calculations - Pisces has two methods for determining SB:

Method A

By default, Pisces looks to see if it is statistically more likely, given the observations, that the variant exists on one stand and not the other vs on both stands. This is kinder to low-depth situations (it understands that a 10% variant isn't very likely to have evidence in both directions if the depth is 10). You can tweak the MaxAcceptableStrandBiasFilter setting if you want more or less stringency in the likelihood calculation.

Method B

Pisces can apply an absolute filter to enforce that evidence for the variant must be observed in both stands (you can turn this on/off with the SSFilter, EnableSingleStrandFilter, and from your cmd line, you have it OFF). We use this for enrichment or non-targeted assays, where we can generally expect a variant will always be covered from both directions.

more info https://github.com/Illumina/Pisces/wiki/Pisces-5.2.7-Supported-Options

ReadStandBias.txt is really just a debugging output. I think it is probably fwd, reverse, stitched depth counts or something like that. If you are comfortable looking at the code, you should be able to trace it through.

tamsen commented 6 years ago

follow up:

Regarding your command line. So, you have very high depth, but you are allowing quite low quality bases (Q10 noise) and the qscore of most of your variants is quite low (~0). So I am not surprised they are not passing the SB filters. Ie

--minvf 0.0002 --mindp 1 --minvariantqscore 0 --minbasecallquality 10 --minmapquality 10"

->

chr7 55242453 . C T 0 q30;SB DP=56579 GT:GQ:AD:DP:VF:NL:SB 0/1:0:56565,14:56579:0.00025:10:NaN chr7 55242454 . G A 0 q30;SB DP=56616 GT:GQ:AD:DP:VF:NL:SB 0/1:0:56602,14:56616:0.00025:10:NaN

But you do seem to have to more likely real variants that are passing filters.

chr12 25398298 . C CG 100 PASS DP=42573 GT:GQ:AD:DP:VF:NL:SB 0/1:100:36752,5821:42573:0.13673:10:-100.0000 chr12 25398299 . A C 100 PASS DP=42573 GT:GQ:AD:DP:VF:NL:SB 0/1:100:36757,5809:42573:0.13645:10:-100.0000

Unless I missed the problem, I think your results are fine.

ShannonDaddy commented 5 years ago

Thanks a lot, I'll check it again.