Closed SAN-AU closed 1 week ago
Hi @SAN-AU,
LOW_DEPTH
filter is mainly there to indicate that a variant might have been called at very low coverage (e.g. at the end of the amplicon).
iii. Medaka uses a neural network to decide whether something is a variant or not and does not output "suboptimal" bases. Therefore, filtering on quality won't improve F1 in the vast majority of cases, but rather will just reduce recall. However, if users want to filter their VCFs after running the workflow, they can of course do so.I hope this was helpful. Please let us know if you have any other questions!
Thank you for clarifying those aspects - appreciate it!
Happy to help!
Ask away!
Using the desktop application, I've completed several analyses of multi-gene amplicon samples mapped to reference sequences in variant calling mode. This has worked well but I have a couple of general questions regarding metrics in the workflow report and the process of variant calling that will provide better context for interpretation of outputs (for a new-comer like myself).
1. Workflow report What does the 'Mean Acc.' value represent? Assuming it stands for mean accuracy, is this related to the accuracy of the read alignment to the reference?
i. From the specified number of reads selected for initial alignment, only 150 of these reads (by default) are then used for variant calling, but this translates to 300x read depth taking into account forward and reverse strands?
ii. And for variants to 'pass' and be incorporated into the consensus, a read depth of only 20 (by default), at that position, is required?
iii. There is no minimum quality score a variant needs to be to 'pass'? Pardon my ignorance, but would it be worthwhile considering including an option for filtering out low quality variants to improve robustness?
Thank you! Appreciate the guidance.