epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

Issue with unclassified reads LSK114 kit #67

Open NikoLichi opened 3 months ago

NikoLichi commented 3 months ago

Hi There,

I ran Pychopper after Dorado SUP basecaller v0.7.2 in a Linux server.

I used a command similar to:

pychopper -r ${SAMPLEID}_report.pdf \
-k LSK114 \
-S ${SAMPLEID}_stats.tsv \
-u ${SAMPLEID}_unclassified.fq \
-w ${SAMPLEID}_rescued.fq \
$SAMPLE ${SAMPLEID}_trim.fastq \
-t 16 \
-m edlib

However, when checking some sample files the unclassified output seems to be rather large. Also, inside it, I still find the sequences for the primers provided for kit LSK114.

For instance, one unclassified file is 706M with about 683.121 reads.
I found 199.881 reads with the VNP primer (ACTTGCCTGTCGCTCTATCTTCTTTTT) and 222.204 reads with the SSP primer (TTTCTGTTGGTGCTGATATTGCT). I think these sequencing reads could be actually detected, trimmed and allocated in the _trim.fastq file, no? What could be happening here?

Thanks and all the best, Nicolas

nrhorner commented 3 months ago

Hi @NikoLichi

I have been able to recreate this, and can find sequences in the unclassified output that contain that should not be there. I will look into this ASAP and get back to you.

Thanks,

Neil

NikoLichi commented 3 months ago

Hi @nrhorner,

Thanks for the reply! Looking forward to hearing back from you with a solution :)

All the best, Nicolas

NikoLichi commented 2 months ago

Hi @nrhorner, Is there any update on this? Kind regards, Nicolas

NikoLichi commented 2 months ago

Hi @nrhorner or anyone seing this thread,

Sorry for asking again, but... is there any update on this? 🙏🏽

All the best, Nicolas