Closed akiezun closed 9 years ago
Case 1: CountVariants vs CountRODs
file /humgen/gsa-hpprojects/GATK/bundle/current/b37/CEUTrio.HiSeq.WGS.b37.bestPractices.b37.vcf
(size 1.9Gb)
GATK4 run using build/install/gatk/bin/gatk
(ie not from a big jar)
running on Mac OS X 10.9.5 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17 56-69 seconds (6 runs) on GATK 3.4-46-gbc02625 34-39 seconds (6 runs) on GATK 4.pre-alpha-41-ge1cafbb-SNAPSHOT
GATK3.4 has an additional ~3-6s startup/winddown time, GATK4 has an additional ~2s startup/winddown time
on profiling, it's clear that the engine adds almost no overhead on top of htsjdk iterators - see screenshot from jprofiler
closing this as resolved. We win and there's no obvious badness in the profile.
reopen - will include NFS too
Case2: running on NFS.
vcfFile /humgen/gsa-hpprojects/GATK/bundle/current/b37/CEUTrio.HiSeq.WGS.b37.bestPractices.b37.vcf
(size 1.9Gb)
reference /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta
Running on the dataflow01 host: Linux 2.6.32-573.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0-b132.
67-69 seconds (3 runs) GATK v3.4-46-gbc02625 40-41 seconds (3 runs) GATK 4.pre-alpha-45-g168cd60
example time GATK3:
real 1m13.062s
user 1m22.039s
sys 0m13.290s
example time GATK4
real 0m40.728s
user 1m38.028s
sys 0m4.842s
case 3: bigger file on NFS
file /humgen/gsa-hpprojects/GATK/bundle/current/b37/dbsnp_138.b37.vcf
(10Gb)
ref /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta
Running on the dataflow01 host: Linux 2.6.32-573.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0-b132. 6 minutes (2 runs) GATK v3.4-46-gbc02625 4.1 (2 runs) minutes GATK 4.pre-alpha-45-g168cd60
example time GATK3
real 6m7.731s
user 6m39.912s
example time GATK4
real 4m6.494s
user 6m8.810s
resolved
the goal is to be at least same as gatk3.4 on single thread. This is for the walker version of the tool. The ticket can be split into a) profile and b) optimize if needed
Note: the GATK3.4 version is called CountRODs
The reason to do this is to see if the engine itself adds any overhead.