WGS pipeline can be noticeably time-consuming due to deep sequencing over the entire genome (~ 2 billion reads). It would be great to parallelize the post-alignment and variant calling process where it might be.
Approach
The parallelization can be done for the post-alignment and variant calling processes:
Variant calling step: scatter-gather approach - the splitting of reference into pieces.
E.g. Mutect2 tool can be run with lists of intervals to restrict operating on a subset of genomic regions.
Issue
WGS pipeline can be noticeably time-consuming due to deep sequencing over the entire genome (~ 2 billion reads). It would be great to parallelize the post-alignment and variant calling process where it might be.
Approach
The parallelization can be done for the post-alignment and variant calling processes:
Markduplicate + BaseRecal + ApplyBaseRecal
spark versions ofGATK
toolsSpark-enabled GATK tools