genome / bam-readcount

Count bases in BAM/CRAM files
MIT License
305 stars 95 forks source link

Deletion counts behaviour #54

Closed iranmdl closed 6 years ago

iranmdl commented 6 years ago

Hello! I have a question regarding how deletions are counted in bam-readcount. Let me show you an example.

I have these two variants:

1   85610808    .   CAGA    C   893.73  .   AC=1;AF=0.5;AN=2;BaseQRankSum=-0.674;ClippingRankSum=0;DP=981;ExcessHet=3.0103;FS=9.835;MLEAC=1;MLEAF=0.5;MQ=25.78;MQRankSum=1.866;QD=1.04;ReadPosRankSum=0.665;SOR=0.186   GT:AD:DP:GQ:PL  0/1:778,80:858:99:931,0,32168
1   88266278    rs387873782 TCTGGGAGGGCTTGCTCCGGGGGCAGTGTGTCCTGTTCTTGTGCAGCCCCTG    T   12561.7 .   AC=1;AF=0.5;AN=2;BaseQRankSum=2.115;ClippingRankSum=0;DB;DP=639;ExcessHet=3.0103;FS=2.461;MLEAC=1;MLEAF=0.5;MQ=60;MQRankSum=0;QD=20.59;ReadPosRankSum=0.864;SOR=0.514   GT:AD:DP:GQ:PL  0/1:285,325:610:99:12599,0,10870

I want to see these two deletions in the bam-readcount output:

1   85610809    A   1482    =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  A:1398:16.98:35.64:0.02:482:916:0.47:0.01:11.16:482:0.38:74.87:0.50 C:1:20.00:15.00:0.00:0:1:0.24:0.04:40.00:0:0.00:75.00:0.88  G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  T:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  -AGA:83:19.57:0.00:0.00:43:40:0.65:0.04:0.67:43:0.44:75.00:0.46
1   88266279    C   384 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  C:383:60.00:37.11:0.31:178:205:0.27:0.00:9.55:178:0.30:65.91:0.40   G:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00  T:1:60.00:7.00:0.00:0:1:0.75:0.03:18.00:0:0.00:75.00:0.63   N:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00

So for the first deletion, I can see the deletion in bam-readcount, but not for the second. If I visualize the aligments of these two positions on IGV: 1_85610808 1_88266279

Both seem to be true deletions. My question is, why only the first one is reported as a deletion in bam-readcount? Is it due to the length of the deletion? Maybe due to the spanning reads supporting the deletion in the first variant but not in the second one?

Thank you in advance,

ernfrid commented 6 years ago

Unfortunately bam-readcount is fairly stupid when it comes to this sort of thing. It can only count up support from gapped alignments. It looks to me like these don't exist in your BAM for the larger variant and thus you won't be able to get counts from that BAM. Some variant callers that perform realignment internally can generate a BAM containing those realignments that may be more useful for getting counts than the initial BAM.

crazyhottommy commented 6 years ago

I had the same experience. bamreadcount missed many long indels called by lancet. https://github.com/nygenome/lancet/issues/11

ernfrid commented 6 years ago

Yes, this is expected behavior for variant callers that perform any sort of assembly or realignment. The counts from the caller itself are certainly going to be more reliable than anything bam-readcount produces.