Closed ArthurDondi closed 2 months ago
Thanks for spotting this! I agree that it's important to document that SComatic uses a default max_depth of 8000, and to add command line options to BaseCellCounter.py
and SingleCellGenotype.py
that allow to modify max_depth. We're working with (very deeply sequenced) mission bio data, and increasing max depth made a day-and-night difference for us.
For their defense, it's a samtools/pysam default behaviour that is quite unexpected (at least to me).
It can have a big impact on very deep sequencing but also in low-depth + high-cell count when using SingleCellGenotype.py
Dear ArthurDondi , Thanks for spotting this issue. Yes, a PR would be great, thanks a lot
Dear users, Thanks for this interesting finding. We will accept the PR in the coming days.
Thanks again for your feedback, Fran
We have now accepted the PR. Thanks!
pysam.pileup
max_depth
argument is by default 8000, however for RNA-seq there are edge cases as below, where a germline intronic SNP in a strongly expressed gene will appear as a false-positive somatic SNV:Top row is the cancer cells sample, mutated with 14% VAF and 19% CCF (max depth in exon ~17000) and the bottom row is the non-cancer cells sample with 4% VAF and 6% CCF (max depth in exon ~70000). This looks like a germline SNP.
With the current
max_depth=8000
I got a PASS mutation because only a fraction of the 475 reads are processed :chr11 61965525 61965525 G A PASS Cancer AAGGG AACAC 1 25 15 4 4 0.16 0.2667 0.0001 0.0 2 2 1;115;0.1465 1;92;0.1055 . PASS DP|NC|CC|BC|BQ|BCf|BCr 94|81|1:0:0:79:0:0|1:0:0:92:0:0|93:0:0:8556:0:0|0:0:0:14:0:0|1:0:0:78:0:0 25|15|4:0:0:13:0:0|4:0:0:21:0:0|372:0:0:1953:0:0|0:0:0:2:0:0|4:0:0:19:0:0
If I raise it to, say,
max_depth=200000
it's aCell_type_noise
, as it should:chr11 61965525 61965525 G A,A Cell_type_noise NonCancer,Cancer AAGGG AACAC 1 479,44 239,26 19,6 15,5 0.0397,0.1364 0.0628,0.1923 0.0017,0.0 0.0,0.0 2 2 1;498;0.3121 1;245;0.2041 . BetaBin_problem,PASS DP|NC|CC|BC|BQ|BCf|BCr 479|239|15:1:0:233:0:0|19:1:0:453:0:0|1767:93:0:42129:0:0|0:0:0:48:0:0|19:1:0:405:0:0 44|26|5:0:0:25:0:0|6:0:0:37:0:0|558:0:0:3441:0:0|0:0:0:3:0:0|6:0:0:34:0:00
pysam.pileup occurs in
BaseCellCounter.py
andSingleCellGenotype.py
. Let me know if you're interested in a PR. There is also a filter bug inbasecallingstep1
as signaled in #16.