Module00c when run with snp_vcfs instead of gvcfs --> BAFFromShardedVCF
Description
Background: When a SNP VCF is used for BAF and is very large and not pre-sharded or filtered to PASS variants, it can take a long time and a lot of disk to run GenerateBAF. For example, a 419 GB un-sharded SNP VCF took 27 hours and 452 GB to run GenerateBAF; given the long runtime, a non-preemptible VM was required. The cost was relatively low (~$3.70), but if we start to encounter even larger SNP VCFs this may become untenable.
Proposed solutions: We should consider adding a separate task/workflow to shard the SNP VCF, or, alternatively, filter it. (Note that if we start to generate BAF from BAM instead, this will become unnecessary.)
Feature request
Module(s) or script(s) involved
Module00c when run with
snp_vcfs
instead ofgvcfs
--> BAFFromShardedVCFDescription
Background: When a SNP VCF is used for BAF and is very large and not pre-sharded or filtered to PASS variants, it can take a long time and a lot of disk to run GenerateBAF. For example, a 419 GB un-sharded SNP VCF took 27 hours and 452 GB to run GenerateBAF; given the long runtime, a non-preemptible VM was required. The cost was relatively low (~$3.70), but if we start to encounter even larger SNP VCFs this may become untenable.
Proposed solutions: We should consider adding a separate task/workflow to shard the SNP VCF, or, alternatively, filter it. (Note that if we start to generate BAF from BAM instead, this will become unnecessary.)