CDPHE-bioinformatics / CDPHE-SARS-CoV-2

Workflows and scripts for the assembly and analysis of SARS-CoV-2 whole genome tiled amplicon sequencing.
https://cdphe-bioinformatics.github.io/CDPHE-SARS-CoV-2/
GNU General Public License v3.0
5 stars 0 forks source link

[FEATURE] Version capture consistency across workflows #29

Open molly-hetheringtonrauth opened 1 month ago

molly-hetheringtonrauth commented 1 month ago

Feature Request

For SC2_illuimina_pe_assembly and SC2_ont_assembly, we have implemented a wdl struct to capture the software, software version, docker and docker version used for each task. The task workflow_version_capture then takes as input the structure from each task and uses a python script to output the information into a tabular format and csv file. This replaced our previous method for version caputure and greatly reduced the number of variables and individual python scripts required. We want to: 1 - Implement the use of wdl structs to capture the software, software version, docker and docker version in our SC2_lineage_calling_and_results and SC2_wastewater_variant_calling workflows 2- Use either a wdl function or bash code or use python code within the command block to format the information into a tabular format instead of using a python script

Solution

For 1 - Use our SC2 assembly workflows as an example and implement the use of wdl structs for the other two workflows For 2 - we could possibly use the write_json wdl function. write_json documentation

Upstream effects

none

Downstream effects

none

molly-hetheringtonrauth commented 1 month ago

Because this python script is used by 3 different repos, we may want to put the python script into the anlaysis_tools repo or we could put the script into a docker container that could be called in the each wdl.

molly-hetheringtonrauth commented 1 month ago

for sample level workflows - the version capture file is overwritten for each sample during the transfer task (when transferring the file to the bucket), so do we want to output a unique file for each sample. Add logic to python script that if sample level workflow then add sample name to the version capture output file. Maybe we could aggregate downstream in BigQuery. @sam-baird