Open LeeTL1220 opened 7 years ago
A little more info about the interval file:
lichtens@OncobuntuMk3:~$ head /home/lichtens/broad_oncotator_configs/allchr.1kg.phase3.v5a.snp.maf10.biallelic.recode.fixed.prune5.trim1M.test.interval_list
1:14604-14604
1:14930-14930
1:15211-15211
1:15820-15820
1:30923-30923
1:49298-49298
1:51479-51479
1:54716-54716
1:55545-55545
1:58814-58814
lichtens@OncobuntuMk3:~$ wc /home/lichtens/broad_oncotator_configs/allchr.1kg.phase3.v5a.snp.maf10.biallelic.recode.fixed.prune5.trim1M.test.interval_list
999914 999914 21793054 /home/lichtens/broad_oncotator_configs/allchr.1kg.phase3.v5a.snp.maf10.biallelic.recode.fixed.prune5.trim1M.test.interval_list
lichtens@OncobuntuMk3:~$
The issue is definitely in htsjdk -- GATK is just calling queryOverlapping()
a single time per file with all of the intervals, having first called QueryInterval.optimizeIntervals()
on the interval list as required by the API.
Included is a JProfiler screenshot of where it is spending all of its time.
As you can see below, it has not even run any of the
LocusWalker.apply(...)
code.