cbi-star / pb-human-wgs-workflow-wdl

BSD 3-Clause Clear License
0 stars 1 forks source link

The restructured wdl-pipeline #2

Open cbi-star opened 1 year ago

cbi-star commented 1 year ago

The restructured pipeline in this repo is called now as automatic genome-sequence analysis pipeline or agape for short, presented the following major changes based on the main repo:

Use a unified data structure as input and re-designed workflows to restructure the wdl-pipeline in hope to simplify the wdl-scripts and provide easy maintenance. There are several major changes. (1) unifying affected/unaffected into one-line code using new input data structure in all wdl-file if applied; (2) removing this struct definition: CohortInfo from structs.wdl and replace it with Array[SampleInfo] in the main workflow, and accordingly changed all related wdl-files whenever applied; (3) modified the struct SampleInfo by adding one element: Boolean? affected; (4) run sample-hifiasm and trio-hifiasm both in the main workflow: trial.wdl; (5) run a separate fasta-conversion once for all, and thus a new workflow created under process.smrtcells: fasta_conversion.wdl, which are used by jellyfish, sample and trio-hifiasm; (6) created new folders (hifiasm/tasks) and new wdl-files to deal with hifiasm computing independently, and to put related workflows/tasks (re-written/modified) into this folders; (7) change all import links to include /cbi-star/ for each wdl-files whenever applied. See details here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/f8c4248df9227964dbe93762581d6732383dfdc5

(8) In order to reduce task callings, the separate sub-workflow separate_data_and_index_files.wdl being removed from the agape WDL-pipeline, and for those workflows calling this sub-workflow: (a) replace it with a local scatter-loop, or (b) when the datafile and indexfile are available from their upstream tasks, simply delete the separate sub-workflow calling and use them. Details can be seen here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/05a0366f90c2a9608d0d3d66aa1f37e68fc7ccb3

(9) hifiasm.wdl is simplified to reduce the task callings and loading times of docker_image. Three tasks n their sequential callings (gfa2fa, fa2bgzip and bgzip2asm) are merged into one new task/definition. Five different kinds of gfa-files are packed in an array to call as in a scatter-gather loop. Two new workflows are added: gfa2asm.wdl and gfa_asm_single.wdl. Details are here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/35e5d35365e49153548261d14cdf8e9184c0f91f

cbi-star commented 1 year ago

In the created hifiasm/tasks folders, there are several major changes/additions. (1) the hfiasm-related wdl-files are modified and moved here; (2) sample_hifiasm.cohort.wdl created/modified to deal with sample-level hifiasm; (3) trio_hifiasm.cohort.wdl created/modified to deal with trio-level hifiasm; (4) triobev.wdl created to test with yak triobin/trioeval independently; (5) yak.wdl uses additional input data, trial.parents_list, to make yak-calculation easy. More details here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/ef14de3eef4ecd58f6c03160d2a9dcce0ec7f938

cbi-star commented 1 year ago

Removing if (cohort_run) condition to allow singleton to run slivar in trial.wdl. See details here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/24b13f1429905dfcbb9cdd7d71ec19c2dd7cfccc#diff-342d68bac97759a0786e6943ae9532c3844ee12d9006371019fd043c1069510b

cbi-star commented 1 year ago

A new workflow, cohort_thin.wdl, is created to run cohort/cohort.wdl standalone and a new data structure is used to largely reduce lines in input.json. See details here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/040772f5890d606e1eadece1709a9069b982cad5

cbi-star commented 1 year ago

Two standalone workflows added with a new struct PacBio defined. See details here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/72d9ffc6f4b2736522b2427dd510f432aefc5a56

cbi-star commented 1 year ago

(1) names added to the IndexedData output from sample.trial.wdl and smrcells.trial.wdl (2) removal of hifiasm-wdl's from /sample/tasks folders, since we already built a new folder hifiasm/tasks and workflows to handle both sample-level and trio-level hifiasm computing (see as above) For details see here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/1a501bf5057c8bdf4a68eae692ed11d7ac5a6065

cbi-star commented 1 year ago

Design a set of new data structures (PacBioInfo and PacBioSampInfo) to simplify standalone/entry workflows (agape_thin.wdl, cohort_thin.wdl and famcohort.wdl), and also removed unused workflows. See details here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/6685fb8b17ec98720559b5b8d4ad0e84655c9382

cbi-star commented 1 year ago

A flowchart outlining the restructured pipeline (agape) workflows' relationship and dependency is attached here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/blob/main/pacbio-wdl-workflow-design-2022-CharlieBi.pdf

cbi-star commented 1 year ago

Adjusted relative pathes for import here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/9e77ad72bdd9da34f88113bd64558c1915389669

cbi-star commented 1 year ago

To reduce task calls, the separate sub-workflow separate_data_and_index_files.wdl being deleted from the agape-pipeline (this repo so called), and any workflows calling this sub-workflow, either replace it with a local scatter-loop, or when datafile and indexfile are available from their upstream tasks, simply delete the separate sub-workflow calling, and directly use them. Details can be seen here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/05a0366f90c2a9608d0d3d66aa1f37e68fc7ccb3

cbi-star commented 1 year ago

hifiasm.wdl is simplified to reduce the task callings and loading times of docker_image. Three tasks in their sequential callings (gfa2fa, fa2bgzip and bgzip2asm) are merged into one task. Five different kinds of gfa-files are packed in an array to run as in a scatter-gather.

Two new workflows are added: gfa2asm.wdl and gfa_asm_single.wdl. Details are here: https://github.com/cbi-star/pb-human-wgs-workflow-wdl/commit/35e5d35365e49153548261d14cdf8e9184c0f91f