epi2me-labs / wf-artic

ARTIC SARS-CoV-2 workflow and reporting
https://labs.epi2me.io/
Other
49 stars 36 forks source link

Regarding filtering of Low Frequency and Subclonal Mutations #68

Closed Rohit-Satyam closed 1 year ago

Rohit-Satyam commented 1 year ago

What happened?

I tried to assign Variant Allele Frequency on the wf-artic VCF files using vafator and I realize that there were few Subclonal and Low frequency Variants ( VAF < 20 % were considered LOW_FREQUENCY and variants with a VAF >= 20 % and < 80 % are considered SUBCLONAL). I wish to understand if they are retained or filtered before consensus FASTA assembly generation?

If not filter, do you suggest to filter them before submitting to GISAID. If yes, would you include this feature to filter out anything lower than 0.8 (or 80%).

Operating System

ubuntu 20.04

Workflow Execution

Command line

Workflow Execution - EPI2ME Labs Versions

Workflow Version

0.3.18

Rohit-Satyam commented 1 year ago

@cjw85 @mattdmem @sarahjeeeze Do you guys have any thoughts on filtering of the variants for generating consensus fasta?

Rohit-Satyam commented 1 year ago

Referencing a related issue here

mattdmem commented 1 year ago

Currently we implement the field bioinformatics package from the Artic Network. We'll take this feedback into account for future versions of the workflow.

corneliusroemer commented 1 year ago

@Rohit-Satyam I'd think you should report the subclonal sites as ambiguous if they are above a threshold of coverage - otherwise make them N. You should definitely not make them reference (reversions to reference are a very common artefact and screw up phylogenetics in a bad way).

I'm not sure what you mean by filter. Filtering them out entirely might cause the site to be output as reference which would be bad.