Open mentorwan opened 1 week ago
For example, in one sample, stats.tsv shows 24,049 non-chimera reads, but the DADA2-generated biom file or qzv file or taxonomy table shows only 24,025 reads
Can you clarify what workflow you are using and how these different numbers are being generated?
The workflow we use is HiFi Full length 16S workflow: https://github.com/PacificBiosciences/HiFi-16S-workflow
The number is generated by output from this pipeline. Here is table in stats.tsv related to this sample:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
sample-id | input | filtered | denoised | non-chimeric | percentage of input non-chimeric -- | -- | -- | -- | -- | -- SC830317 | 39431 | 24600 | 24132 | 24049 | 60.99
We ran a full-length Pacbio DADA2 analysis. Here is a question we encountered during the process: There is some minor read loss during the DADA2 process. For example, in one sample, stats.tsv shows 24,049 non-chimera reads, but the DADA2-generated biom file or qzv file or taxonomy table shows only 24,025 reads—a loss of 24 reads.
I previously thought the number of reads would match the number of non-chimera reads after QC. Although this read loss is minimal, I checked other samples: some show no loss while others have very few lost reads.
Maybe it’s not a significant issue. Could you clarify our understanding or provide any related information we might be missing? Thanks.