Closed HongyuanChu closed 5 months ago
Which version are you using?
I am using PanDepth-2.24-Linux-x86_64; If adding a bed file pandepth -i /data/chuhy/project/Hg/HG002_GRCh38_ONT-UL_GIAB_20200122.phased.bam -o /data/chuhy/project/Hg/pandepth/Hg002 -b HG002_GRCh38_1_22_v4.2.1_benchmark_noinconsistent.bed It would not >100% anymore
Can you provide a download link for the UL ont of HG002 as well as the reference genome? Also the commands you alignment are needed. I would try to reproduce your results as closely as possible and then look for the exact bug.
Thank you very much!
The bam files were standard samples from GIAB. Reads were separately aligned to both hs37d5 and hg38 human genome reference assemblies using Heng Li's minimap2 (https://github.com/lh3/minimap2) with the following options:
-a -z 600,200 -x map-ont
I apologize for the delayed response. We have identified the cause of the bug. In order to conserve memory usage, PanDepth employs a sliding window approach for calculation when detecting the index of bam/cram files. However, this method presents an issue: if the window size is set too small, certain reads may span multiple windows, particularly when a read contains numerous or very long insertions or deletions. As a result, some genomic positions are likely to be redundantly calculated, leading to coverage values exceeding 100%. To comprehensively address this issue and further enhance the multi-threading efficiency of PanDepth, we have decided to perform a complete code refactoring. However, this process may require a substantial amount of time to complete.
As a temporary solution to your issue, you can rename the index file suffix of the bam from 'bai' to another name, such as 'baix', and then recalculate. This will yield the correct results.
It should be noted that when performing calculations without using the index, you can still enable multi-threading. However, setting '-t' to a value exceeding 5 will not further improve computational efficiency.
This is the result of the calculation without using the index for the ONT BAM file of HG002(HG002_GRCh38_ONT-UL_GIAB_20200122.phased.bam).
Thank you so much for your timely and useful reply!
This is the result of the coverage calculation using Samtools, which took 160 minutes to complete. (If feasible, we can temporarily keep this issue open so that I can inform you promptly upon completing the code refactoring.)
We have found a solution to the problem and will release version 2.25 today (or tomorrow ) after the test has no problems
This is the result of the calculation by version 2.25 using the index for the ONT BAM file.
Command line: pandepth -i /data/chuhy/project/Hg/HG002_GRCh38_ONT-UL_GIAB_20200122.phased.bam -o /data/chuhy/project/Hg/pandepth/Hg002 -t 5 Some results have Coverage>100%.