a-slide / pycoQC

pycoQC computes metrics and generates Interactive QC plots from the sequencing summary report generated by Oxford Nanopore technologies basecaller (Albacore/Guppy)
https://a-slide.github.io/pycoQC/
GNU General Public License v3.0
258 stars 41 forks source link

PycoQC fails when fastq is trimmed using Pychopper. #136

Closed bernardo-heberle closed 1 year ago

bernardo-heberle commented 2 years ago

Hello,

PycoQC exits with an error when I use it in with a bam file generated from a fastq file that was trimmed using Pychopper. When I do not use Pychopper PycoQC works fine.

Here is a file containing the error message displayed by PycoQC: pycoQC_error_pychopper.txt

I was wondering if you have ever had someone run into a similar issue before and know of a simple solution? If not I can provide a small test dataset and more details to help try to get to the reason behind this issue.

Thank you for your time.

bernardo-heberle commented 1 year ago

After months I finally figured out how to resolve this!

Pychopper assigns new read ids, so the bam file and the sequencing_summary.txt file will no longer have matching read_ids after you pre-process your fastq with pychopper. This will cause pycoQC to fail with the error message seen above.

I created a python script that takes the fastq from pychopper and the original sequencing_summary.txt file to make a new sequencing_summary.txt file with the correct pychopper read names. Here is the script:
fix_sequencing_summary_pychopper.txt

Using the updated sequencing_summary.txt file made with this script and the bam file from alignining the fastq file pre-processed with pychopper will make pycoQC run as expected.