Open molly-hetheringtonrauth opened 1 month ago
Because this python script is used by 3 different repos, we may want to put the python script into the anlaysis_tools repo or we could put the script into a docker container that could be called in the each wdl.
for sample level workflows - the version capture file is overwritten for each sample during the transfer task (when transferring the file to the bucket), so do we want to output a unique file for each sample. Add logic to python script that if sample level workflow then add sample name to the version capture output file. Maybe we could aggregate downstream in BigQuery. @sam-baird
Feature Request
For SC2_illuimina_pe_assembly and SC2_ont_assembly, we have implemented a wdl struct to capture the software, software version, docker and docker version used for each task. The task workflow_version_capture then takes as input the structure from each task and uses a python script to output the information into a tabular format and csv file. This replaced our previous method for version caputure and greatly reduced the number of variables and individual python scripts required. We want to: 1 - Implement the use of wdl structs to capture the software, software version, docker and docker version in our SC2_lineage_calling_and_results and SC2_wastewater_variant_calling workflows 2- Use either a wdl function or bash code or use python code within the command block to format the information into a tabular format instead of using a python script
Solution
For 1 - Use our SC2 assembly workflows as an example and implement the use of wdl structs for the other two workflows For 2 - we could possibly use the
write_json
wdl function. write_json documentationUpstream effects
none
Downstream effects
none