Closed pontushojer closed 4 years ago
chr1 also doesn’t follow the pattern. Or is this an artifact?
Yeah its true chr1 also takes longer than expected. It was however not as striking as chrY but still this might relate to a shared issue.
I did a check for a separate dataset and compared against all other rules that are run for the chunks. Looking at this we see that chr1, chr16, chr21 (somewhat less though) and chrY stick out from the rest (see the red trace for clusterrmdup).
The runtime for the
clusterrmdup
(find_clusterdups
after PR https://github.com/NBISweden/BLR/pull/30 merges) step is much longer for chrY than other chromosomes. See the following graphs below generated based on data in/proj/uppstore2018173/private/pontus/runs/200819._synchronise-merges_rerun
. I have also seen this phenomenon in other runs.Runtime vs mean coverage for each chromosome. For chrM this is the sum of all small contigs that make up this "chunk".
Runtime vs total contig length for each chromosome. For chrM this is the sum of all small contigs that make up this "chunk".
From these figures it is clear that chrY for some reason take longer that should be predicted based on coverage and contig length. Chr16 also somewhat breaks this pattern.
What could be the reason for this??