a-slide / pycoQC

pycoQC computes metrics and generates Interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecaller (Albacore/Guppy)
https://a-slide.github.io/pycoQC/
GNU General Public License v3.0
271 stars 41 forks source link

Different number of reads by barcode #130

Open valery-shap opened 3 years ago

valery-shap commented 3 years ago

Hello,

version of PycoQC 2.5.2. There are no errors, but the number of reads by barcode in html file is different from counting by other methods. I tried to count pass reads by bash (zcat |wc -l) /4|bc and I parsed the sequencing_summary file with conditions: 'barcode_arrangement' == 'barcode02' & 'passes_filtering' == True The number is identical between counting by bash and parsing. What could be the mistake? Upd I counted the reads that PASS the guppy filter: "The minimum q-score a read must attain to pass qscore filtering. The default value for this varies by configuration: for faster models it is 7.0, roughly corresponding to an accuracy of 85%, and is higher for more accurate models. This should have a minimal impact on output." I had the accurate basecalling model. So the limit for filtering is not always 7.0 and I suppose that PycoQC counts the reads that have mean_qscore_template > 7

Best regards, Valery

OKyne1 commented 9 months ago

Yep I encountered this problem too and spent a long time trying to find missing reads. Thanks for pointing it out valery-shap.

It looks like you can specify the min pass value in pycoQC. So as long as you use this then it shouldn't be an issue