Closed Marc-Higgins closed 4 months ago
Hi Marc!
The consensus sequences and BAM files for each sample are copied to the output directory of the workflow. Please let us know if you have any other questions.
Hi @julibeg - sorry for the delay here.
In my consensus file e.g.:
output/barcode<xx>/consensus/medaka.consensus.fasta
file, the reported consensus sequences are simply just the inputted references combined into one file.
For example I inputted 12 reference sequences, and in a particular barcode the wf-amplicon noted presence of 2 distinct amplicons. However in the medaka.consensus.fasta file for this barcode - all 12 reference sequences are present. Is this an expected feature? Ideally, I would like to be able to view the generated consensus sequences (not just the references), with variants or indels as determined by the workflow. Thank you.
The workflow per default assumes that all reference sequences are present in each barcode / sample. You can change this behaviour by providing a ref
column in the sample sheet. Please see the second question in the FAQs for details.
I realise that this is counter-intuitive and we might change this in the future to only emit a consensus for reference sequences that had more than a certain number of reads mapped to them.
Thanks for the clarification @julibeg.
Why are the entries in the fasta file within the "consensus" folder without any variants, even when variants are reported in the emitted VCF? Thanks again for your assistance.
Why are the entries in the fasta file within the "consensus" folder without any variants, even when variants are reported in the emitted VCF?
Could it be that these variants didn't pass the depth filter threshold? They would show up under the "Low depth" tab in the Variants section of the report (and in the VCF they would have LOW_DEPTH
in the VCFs FILTER
column.
If these variants have PASS
in the FILTER
column and are still missing from the consensus, then there might be a bug in the workflow. In this case could you please provide me with
params.json
file found in output directory of this runThanks!
@julibeg - they are low LOW_DEPTH denoted in the filter column. This is clarified - thank you!
Hi folks,
I am using wf-amplicon to extract consensus sequences which may align to one of many references. The report is very useful in that I can see which reference sequences are present in my sample, and at what proportion. Is there a way to extract this information programmatically from the outputted files?
Ideally, I'd like to be able to pull the consensus sequence(s) generated for each present sample, and also the proportion of reads mapping to each amplicon in my sample.
Thanks! Marc.