CDPHE-bioinformatics / CDPHE-SARS-CoV-2

Workflows and scripts for the assembly and analysis of SARS-CoV-2 whole genome tiled amplicon sequencing.
https://cdphe-bioinformatics.github.io/CDPHE-SARS-CoV-2/
GNU General Public License v3.0
5 stars 0 forks source link

Update Medaka for new basecall models and model auto-detect. #76

Closed sam-baird closed 2 weeks ago

sam-baird commented 2 weeks ago

This PR closes #74

Aim, context, and functionality 🎯

This PR updates the staphb/artic Docker image version from 1.2.4-1.11.1 to 1.2.4-1.12.0 in the Medaka task in the ONT assembly workflow to handle the most recent basecaller/medaka models. The medaka model can now be auto-detected from the FASTQ. I decided to keep medaka_model as an optional input because some FASTQ may not have model information in the headers (for example downloaded from SRA).

Workflow Changes βœ…

Upstream Effects

None

Input Changes

medaka_model is optional and should generally be blank to allow auto detection (avoids using the wrong model).

Output Changes

The assembly version capture file has a new row recording medaka_model version.

Downstream Effects

Testing πŸ› οΈ

test_cov_2205_grid (new basecaller version- r1041_e82_400bps_hac_v4.3.0) cov_2205_grid (old basecaller version - r1041_e82_400bps_hac_v4.2.0)

Test(s) performed:

Ran test_cov_2205_grid with and without model auto-detection, checked outputs are the same, did not crash. Ran cov_2205_grid with model auto-detection and compared summary results to previous results with TheiaValidate.

Developer Checklist πŸ‘·β€β™€οΈ

Reviewer Checklist πŸ”