Closed JWDebler closed 1 year ago
Hi there!
I would like to get the consensus sequence as a fasta file and might as well output the bam file since it is generated already.
This has been added in v0.1.3.
The html report could also be enhanced by showing the reference sequence and highlighting any variants.
Thanks for the feedback! Could you please give a few more details on what this would ideally look like / what you would use it for? Specifically, would you need an interactive, zoomable "genome-browser-style" view (potentially also showing the pileup) or would a more static plot simply showing where the variants occur be enough?
Hi Juli, well, if I could wish for it I'd like to be able to not just supply a reference sequence, but also a reference annotation (in GFF) and the output report would display the mapping overlayed with the annotation and then a track showing potential variations. Yes, a genome browser style visualisation (with annotation) would be great! Something like this:
Cheers
Hi @JWDebler! We looked into the genome browser request and explored a couple options, but since users can have many samples as well as multiple references of arbitrary length / complexity of annotation, including something like this in the report runs the risk of dramatically blowing up the size of the HTML and making it unresponsive. We therefore won't pursue this further for now, sorry. If users need to perform such downstream analysis, we recommend using dedicated softwared like IGV or similar.
the workflow could, however, provide the option to output joint bam and vcf files, with all samples and/or all targets in one bam/vcf. this would make loading it into IGV more straight forward.
Also, I note that somewhere along the way the workflow renames chromosome names. In the example data provided, the reference sequence name contains ":" which become "_" in the #CHROM column of the vcf produced by Medaka. This has the consequence that IGV will not recognise them as same (the reference sequence name and the chromosome where the variants have been found) and will display "no variants found".
Another issue I encounter is that medaka apparently isn't using the sample sheet, but labels the sample in the vcf generically with "SAMPLE". This makes it difficult to compare samples in IGV as the all have the same name.
Thanks for the feedback, @warthmann! We will address these points in the near future.
v0.3.0 introduced the following changes to make the output play nicer with IGV:
the workflow could, however, provide the option to output joint bam and vcf files
The option --combine_results
has been added to do this.
Also, I note that somewhere along the way the workflow renames chromosome names.
Mosdepth and Medaka cannot deal with :
or *
in the sequence names, which is why we sanitise them. The workflow now also releases the reference file with the sanitized seq. names so that this can be used when loading data into IGV.
medaka apparently isn't using the sample sheet
This has been fixed.
I tested v0.3.0 and the new features worked nicely with my files! thanks!
What happened?
I would like to get the consensus sequence as a fasta file and might as well output the bam file since it is generated already. The html report could also be enhanced by showing the reference sequence and highlighting any variants.
Cheers
Operating System
Windows 10
Workflow Execution
Command line
Workflow Execution - EPI2ME Labs Versions
No response
Workflow Execution - CLI Execution Profile
Docker
Workflow Version
epi2me-labs/wf-amplicon v0.1.2-g3c36817
Relevant log output