Closed xingyaoc closed 1 year ago
@xingyaoc,
Thanks for reporting the issue! We confirmed that the non-determinism itself was from samtools merge
. The issue happens when the order of input BAMs to samtools merge
changes. Please try using the same order of input BAM filenames.
Zhenxian
This has already been resolved by I am sharing the email thread with @zhengzhenxian about this issue here.
ClairS returns slightly different results depending on the order in which bam files are being merged with
samtools merge
. While samtools merge does return merged bams with dissimilar ordering, I do not think it should affect variant calling. Could you help me understand a). how these functionally identical bam files are causing differences in ClairS output, and b). how significant are these differences?Thanks for your quick response to my issue. My team and I are taking your advice to try to force deterministic ClairS outputs in our pipeline. Taking your first suggestion, my team has filtered out non-unique reads and verified with BamUtil diff that two separate bam files are functionally identical -- multiple reads mapped to the same position are still ordered differently. However, calling ClairS on these two bam files again returned different results. Also, I was not able to detect any differences between the pileup files from either the output temp files or my own mpileup outputs. Can you please advise us on the source of the nondeterminism? We do believe that sorting the bam files before merging would work and we are in the process of implementing it in our pipeline, but my team would like to understand the root cause so that we can properly assess the validity of our results.