CDPHE-bioinformatics / CDPHE-SARS-CoV-2

Workflows and scripts for the assembly and analysis of SARS-CoV-2 whole genome tiled amplicon sequencing.
https://cdphe-bioinformatics.github.io/CDPHE-SARS-CoV-2/
GNU General Public License v3.0
5 stars 0 forks source link

[REQUEST] Update Aggregate Lineages to use CDC grouping URL #13

Open molly-hetheringtonrauth opened 4 months ago

molly-hetheringtonrauth commented 4 months ago

Feature Request

Currently we are having to manually update the cdc_lineage_groups.json file each week which is an input in the lineage_calling_and_results.wdl. Sam has written code for the cloud-run-aggregate-lineages repo that downloads the cdc aggregrate lineage groupings json directly from the CDC Covid variant dashboard website. We want to move away from manually updating the cdc_lineage_groups.json and incorporate Sam's automated code into the lineage_calling_and_results.wdl.

Solution

Pull the code from the cloud-run-aggregate-lineages repo that automates pulling the json from CDC's dashboard. Incorporate the code into the concat_seq_metrics_and_lineage_results.py script for the results_table task in the lineage_calling_and_results.wdl. Update the task and wdl inputs as needed.

Downstream effects

Code duplicates - We also aggregate lineages based on CDC grouping in the wastewater heatmap co-lab notebook.