Closed mgcam closed 4 years ago
Had the BAM file been processed to correct any ambiguity bases? If not there are differences when CRAM regenerates the MD tags as it will use the real reference base rather than the "random" [ACGT] that BWA fills N and ambiguity codes with.
I'm assuming that was the only affected field.
We have not updated this to support any relevant API changes in the C layer so there may be some impact there.
What input do you use in your pipeline? We'd like to have the same results as you have.
You haven't answered the questions I asked to try an identify the cause
Had the BAM file been processed to correct any ambiguity bases? If not there are differences when CRAM regenerates the MD tags as it will use the real reference base rather than the "random" [ACGT] that BWA fills N and ambiguity codes with.
I'm assuming that was the only affected field.
We have not updated this to support any relevant API changes in the C layer so there may be some impact there.
We are using htslib 1.9 as indicated in the Dockerfile.
What input do you use in your pipeline? We'd like to have the same results as you have.
I'm not sure how the input for our pipeline is relevant to you not getting the same result from bam_stats
for BAM/CRAM of the same data
bam_stats v 4.4.1 was tested it on a cram and a bam file for the same data and found tiny differences in #_divergent_bases values (example 264907761 (bam) vs 264911747 (cram)). Is this a known feature? The tool was compiled against libhts v. 1.10.2 (on behalf of NPG)