Pychopper produces UMIs with nucleotide "N"

AdnanAbouelela commented 1 year ago

Hi everyone,

thank you for providing this useful tool! I noticed, that some of the UMIs detected by Pychopper contain N nucleotides and was wondering if this is intentional or an artifact. They appear to be at the end of the UMI and occur at a very low frequency (e.g. 50 reads in roughly 1M).

Specifically, the library has been prepared using the ONT PCB111.24 (PCR-cDNA Barcoding) kit and was sequenced on a FLO-MIN106D (R9.4.1) with an MK1C. The reads were manually basecalled and demultiplexed using guppy (dna_r9.4.1_450bps_hac.cfg). A minimal fastq file containing 4 reads from this sequencing run is uploaded here:

https://drive.google.com/drive/folders/17iotOe6jDRXF7-5a-36SkWT1m1vBj8dg?usp=sharing

When running Pychopper (version 2.7.2, environment yaml attached) using

singularity exec ../../singularity/pychopper_ubuntu-22.04.sif pychopper -r out/report.pdf -u out/unclassified.fq -w out/rescued.fq -S out/stats.tsv -k PCS111 -q 1e-05 -U mre.fastq out/mre_out.fq

two reads on either of the + and - strands contain correct UMIs, whereas the other two reads produce UMIs "TTTAGGCTTGAGCTTACCGTTGGACTTN" (- strand) or "NNNCCCCTTAAGATTCGGGTTGGGATTT" (+ strand). All output files are accessible in the drive. I also noticed, that in the latter case, the detected UMIs lie within the reported sequence, and it is unclear to me why such cases would be included. The cut-off parameter of 1e-05 was determined based on the full dataset (report pdf is attached as well).

Currently, I am just discarding those "artifactual" reads, but I am curious as to why this happens. Please let me know if I need to provide any additional information.

Thank you so much! Adnan

nrhorner commented 1 year ago

Hi @AdnanAbouelela

Thanks for raising this issue. I am looking into it now

nrhorner commented 1 year ago

I am able to replicate this problem. Thanks for the test data!

nrhorner commented 1 year ago

Hi @AdnanAbouelela

V2.7.7 has been released which should fix this issue you reported. Please get in touch again if you have any more problems.

Thanks,

Neil

epi2me-labs / pychopper

Pychopper produces UMIs with nucleotide "N" #36