epi2me-labs / wf-artic

ARTIC SARS-CoV-2 workflow and reporting
https://labs.epi2me.io/
Other
47 stars 34 forks source link

[enhancement:] wf-artic pipeline filters too many reads #65

Closed Rohit-Satyam closed 1 year ago

Rohit-Satyam commented 1 year ago

Hi @mattdmem

I would like to understand if it is appropriate to reduce the min_len threshold to 200 because 400 bp is too strict for ARTIC primers since we clearly see improved vertical coverage, resulting in more complete consensus assemblies when using 200bp. We sequenced 8 batches of clinical samples using MinIon and we observe in all runs that most of the fastq_pass reads length ranges from 200-400bp as shown below. We are using ARTIC V4.1 primers and I understand that ideally the amplicon size should be around 400 but what I would like to understand is why the default is set so stringent.

Also, what median quality score is good score according to your experience because in almost all batches we obtain 11?

image

Besides, is it possible to cluster the Number of reads by Sample plot and Location of Ns in Final Consensus plot by Next clade qc.overallStatus (like all bad samples, together, followed by mediocre and good samples; something like shown below that I produced in R). It would make it easier to spot the bad samples from good samples in a first glimpse.

In addition, if will also be helpful if the Genome coverage plots can also be arranged in the similar order as clusters in the plots above to find out if the samples performed bad due to low vertical coverage (but had a good horizontal coverage and can be rescued via re-sequencing) or due to patchy coverage (which have no hope for rescue since horizontal coverage is poor)

image

mattdmem commented 1 year ago

Thanks for this @Rohit-Satyam and sorry for the late reply. We provide the ability to customise which reads you filter based on length so that you can change based on your observations/needs. The defaults serve as a guide only and have performed well in our own testing.