epam / fonda

Fonda is a framework which offers scalable and automatic analysis of multiple NGS sequencing data types
Apache License 2.0
8 stars 2 forks source link

WGS pipeline parallelization #202

Open kamyshova opened 3 years ago

kamyshova commented 3 years ago

Issue

WGS pipeline can be noticeably time-consuming due to deep sequencing over the entire genome (~ 2 billion reads). It would be great to parallelize the post-alignment and variant calling process where it might be.

Approach

The parallelization can be done for the post-alignment and variant calling processes:

Spark-enabled GATK tools