Open andypexai opened 7 months ago
I'll make a genome of 30 bases, and my BAM has one read, 21 bases in length, that spans from bases 5-25 (1-based start) or 4-25 (0-based start):
██████FAKE_READ██████ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1 2 3 123456789012345678901234567890
It has this as a BAM:
$ samtools view -h test.bam @HD VN:1.5 SO:coordinate @SQ SN:chr1 LN:30 @PG ID:samtools PN:samtools VN:1.19.2 CL:samtools view -h test.bam FAKE_READ 0 chr1 5 1 21M * 0 0 * * AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:21 YT:Z:UU
It should be 4 bases depth 0, followed by 21 bases depth 1, followed by 5 bases at depth 0. See the samtools output below:
$ samtools depth -a test.bam chr1 1 0 chr1 2 0 chr1 3 0 chr1 4 0 chr1 5 1 chr1 6 1 chr1 7 1 chr1 8 1 chr1 9 1 chr1 10 1 chr1 11 1 chr1 12 1 chr1 13 1 chr1 14 1 chr1 15 1 chr1 16 1 chr1 17 1 chr1 18 1 chr1 19 1 chr1 20 1 chr1 21 1 chr1 22 1 chr1 23 1 chr1 24 1 chr1 25 1 chr1 26 0 chr1 27 0 chr1 28 0 chr1 29 0 chr1 30 0
And just to provide a little more info, here are a couple bedtools operations:
$ bedtools bamtobed -i test.bam chr1 4 25 FAKE_READ 1 + $ printf "chr1\t0\t30\n" > chr1.bed $ bedtools coverage -a test.bam -b chr1.bed chr1 4 25 FAKE_READ 1 + 4 25 0,0,0 1 21, 0, 1 21 21 1.0000000
Everything looks as expected to me. Now I'll make the d4 file:
$ d4tools create -q=1 test.bam test.d4 $ d4tools view test.d4 chr1 0 4 0 chr1 4 26 1 chr1 26 30 0
Here it appears that the d4tools believes my read is 22 bases long, so the end coordinate is off by one, more specifically it's +1 more than it should be.
@38 this does look like a bug. Do you possibly have time to have a look?
I'll make a genome of 30 bases, and my BAM has one read, 21 bases in length, that spans from bases 5-25 (1-based start) or 4-25 (0-based start):
It has this as a BAM:
It should be 4 bases depth 0, followed by 21 bases depth 1, followed by 5 bases at depth 0. See the samtools output below:
And just to provide a little more info, here are a couple bedtools operations:
Everything looks as expected to me. Now I'll make the d4 file:
Here it appears that the d4tools believes my read is 22 bases long, so the end coordinate is off by one, more specifically it's +1 more than it should be.