Closed chatla01 closed 6 months ago
Hi @chatla01,
The workflow creates the output sequence through the joint use of flye as an assembler and medaka as a consensus.
Both of these tools work only from information stored in fastq (not primary instrument data). Their ability to resolve homopolymers is therefore tied to the basecaller output, the evidence presented in the fastq files.
In the case of the workflow, you can obtain differing results from multiple runs as the code samples the input data in a random fashion. Typically this does not affect the result as information from random samples is generally consistent. However there is more variation in the output of the basecaller around long homopolymers; so random samples can behave differently.
You may wish to open a discussion on dorado to discuss improvements to the basecalling.
Thank you @Chris. I will look into dorado discussion.
Operating System
macOS
Other Linux
Ubuntu 20.04
Workflow Version
v0.3.1
Workflow Execution
EPI2ME cloud agent
EPI2ME Version
v0.3.1
CLI command run
I am running the EPI2ME Labs wf-clone-validationv0.3.1 on Mac as well as GridION. Homopolymeric regions more than 20 bp and poly A tails in plasmids are not resolved. I used RBK114.24 kit with 10.4.1 Flow cells used Dorado basecaller with super accurcy and On for modified baseses. I feed the fastq_pass folder to wf-clone-validationv0.3.1 on GridIOn and/or Macbook pro with M1 in either cases it is not resolving Homopolymeric regions more than 20 bp and poly A tails and if run multiples times different number of poly A are captured. I have attached some screenshots. Any suggestion to overcome this problem?
Workflow Execution - CLI Execution Profile
standard (default)
What happened?
Homopolymeric regions more than 20 bp and poly A tails in plasmids are not resolved
Relevant log output
Application activity log entry