Hi,
I am checking MHL values of my BAM files and see something unexpected. The sample BAM I can share holds alignments of amplicon bisulfite sequencing of five regions to a very high depth (5000-50000x) - can be downloaded from here due to size (40MB).
First, the -d option seem to be more restrictive than it should. Default (-d 10) leads to most of CpGs excluded: only 44 out of 138 with the coverage >1000 are included in the output. Specifying -d 100 leads to output with only 14 genomic positions.
Second, the MHL output values are quite strange. ~99% of reads in this BAM have almost no methylated cytosines and ~1% of reads are nearly fully methylated in CpG context. I expect to see MHL (much) lower than 0.01, but reported values include 0.65 for 17 positions, which does not seem to be correct.
Third, which is more a suggestion rather then an error - maybe metheor could skip records without XM tag, simply reporting their number. Unmapped records are allowed and often present in BAM and they shouldn't have XM in them. samtools can of course be used to create valid input for metheor, but maybe you consider this more permissive behaviour worth implementing.
And fourth (which is also a suggestion) - the positions reported should probably be 1-based rather than 0-based. Now it seems that CpG spans 3bp:
chr17 43125376 43125378 0.6556599
Tested this on arm64 Mac, manually compiled the latest metheor (2895cb3) with cargo build. I did not try the reference implementation of MHL, but maybe you have it up and running.
Hi, I am checking MHL values of my BAM files and see something unexpected. The sample BAM I can share holds alignments of amplicon bisulfite sequencing of five regions to a very high depth (5000-50000x) - can be downloaded from here due to size (40MB).
First, the
-d
option seem to be more restrictive than it should. Default (-d 10
) leads to most of CpGs excluded: only 44 out of 138 with the coverage >1000 are included in the output. Specifying-d 100
leads to output with only 14 genomic positions.Second, the MHL output values are quite strange. ~99% of reads in this BAM have almost no methylated cytosines and ~1% of reads are nearly fully methylated in CpG context. I expect to see MHL (much) lower than 0.01, but reported values include 0.65 for 17 positions, which does not seem to be correct.
Third, which is more a suggestion rather then an error - maybe metheor could skip records without XM tag, simply reporting their number. Unmapped records are allowed and often present in BAM and they shouldn't have XM in them. samtools can of course be used to create valid input for metheor, but maybe you consider this more permissive behaviour worth implementing.
And fourth (which is also a suggestion) - the positions reported should probably be 1-based rather than 0-based. Now it seems that CpG spans 3bp:
chr17 43125376 43125378 0.6556599
Tested this on arm64 Mac, manually compiled the latest metheor (2895cb3) with
cargo build
. I did not try the reference implementation of MHL, but maybe you have it up and running.