Closed valery-shap closed 3 months ago
There is a file produced by the workflow which has per-read statistics, it is used to generate the report file. The workflow doesn't appear to publish this as a final output though, we can change that.
Thank you very much for reply. It would be great!
because alternative way is to use dorado only version and the only route (if I understood right) is: fast5 files transform to pod5 files, then run Dorado with bam output, then run "dorado summary" (it works with only bam files) which will give a "sequencing_summary" guppy like report, then transform bam files to fastq files using samtools. and finally extract qscores from a sequencing_summary file.
Does the workflow use the same route to get statistics information? so is the value for the qscore of the read in this report generated by a workflow the same as the value from a "mean_qscore_template" column from the output table of the command dorado summary?
Also, I realyzed that estimating average Qscore using , for example, seqkit after basecalling is not right, because: https://github.com/shenwei356/seqkit/issues/328 "you can’t just do simple arithmetic mean of all the qscores, because it won’t be a representation of the mean error rate then." https://community.nanoporetech.com/posts/what-is-the-base-value-for
Sorry for long explanation, but it seems that it's important to clarify basic definitions and be sure the same values are discussed.
A lot of thanks, Valery
Does the workflow use the same route to get statistics information?
No. The workflow has functionality that pre-existed the dorado summary
command.
so is the value for the qscore of the read in this report generated by a workflow the same as the value from a "mean_qscore_template" column from the output table of the command dorado summary?
This numbers are similar. dorado trims an arbitrary 60 bases from the front of reads when calculating a mean quality score. The program the workflow uses to create this statistic does not apply such trimming.
Hi,
Thank you for your explanation. It'll be very useful if a table with Q scores is added.
Best regards, Valery
Hello,
Thank you for a useful workflow! I've one question. Previously (with guppy), a sequencing report was generated and it was used by tools which extracted all statistics. I found solution how to generate this file using Dorado and some additional steps, but it is more comfortable to use a wf-basecalling workflow. Could you please advice how statistic information (for example, average quality of reads) could be extracted? I see only a graph "Read quality" in a file "wf_basecalling_report.html" and opportunity to download it (not a table).
A lot of thanks, Valery