Closed akiezun closed 9 years ago
Case 1: VariantFiltration (-filter 'DP > 100'
)
file /humgen/gsa-hpprojects/GATK/bundle/current/b37/CEUTrio.HiSeq.WGS.b37.bestPractices.b37.vcf
(1.9 Gb)
running on Mac laptop with SSD. Mac OS X 10.9.5 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_25-b17. 92-115 seconds (3 runs) on GATK 4.pre-alpha-41-ge1cafbb-SNAPSHOT 103-128 seconds (3 runs) on GATK 3.4-46-gbc02625
Here's a profile. It's clear that all time goes into reading and writing and almost no overhead comes from the engine. Closing this - we win and no obvious problems in the profile.
reopen - will look at NFS too
case 2 file on NFS
/humgen/gsa-hpprojects/GATK/bundle/current/b37/CEUTrio.HiSeq.WGS.b37.bestPractices.b37.vcf
1.9Gb
ref /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta
Linux 2.6.32-573.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0-b132. 2.9 - 3 minutes (2 runs) v3.4-46-gbc02625 2.2 minutes (2 runs) GATK 4.pre-alpha-45-g168cd60
time GATK3
real 3m3.714s
user 3m52.474s
time GATK4
real 2m14.264s
user 3m17.439s
resolving
but see #1129
the goal is to be at least same as gatk3.4 on single thread. This is for the walker version of the tools. The ticket can be split into a) profile and b) optimize if needed