Closed droazen closed 9 years ago
A quick one for @akiezun
It's bad, as expected.
Case is file on NFS /humgen/gsa-hpprojects/GATK/bundle/current/b37/CEUTrio.HiSeq.WGS.b37.bestPractices.b37.vcf 1.9Gb
ref /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta
this is on dataflow01.broadinstitute.org on Linux 2.6.32-573.3.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0-b132
Using parameters --clusterSize 3 --clusterWindowSize 100
gatk3 real 5m9.698s user 4m34.835s (GATK 3.4-46-gbc02625) gatk4 real 8m45.901s user 10m18.663s
Ok, then let's create a beta (not alpha) ticket to address this.
done #1151
VariantFiltration
in GATK4 with clustered SNP filtering on is likely to underperform GATK3, as this results in queries against the driving source of variants both before and after the current variant, and we have caching turned off for the copy of the driving datasource added to theFeatureManager
for querying (as our caching strategy for features is currently only able to look ahead).Task is to run
VariantFiltration
on both GATK3 and 4 with--clusterSize
and--clusterWindowSize
, record how much worse GATK4 performs for this use case, and (assuming it does lose to GATK3) create a beta ticket to fix it (not urgent enough for alpha).