epi2me-labs / pychopper

cDNA read preprocessing
Other
54 stars 9 forks source link

Barcode sequence still in reads after trimming #64

Open NHoang98 opened 1 month ago

NHoang98 commented 1 month ago

Hello, Our lab currently trying the cDNA-PCR barcoding kit (PCB111.24) for our de novo assembly transcriptome project. Our issue is that after using pychopper, we still somehow have the barcode sequence and PCR adapters after re-checking with porechop.

From what I understand, an ideal full-length read has this type of structure:

RAP T-Barcode-SSP-sequence-VNP/RT primer-Barcode-RAP T

So after detecting the full-length sequence by identifying/ trimming SSP and RT primer and both ends, we should have only the target sequence.

For detail: We run pychopper with our libraries after demultiplexing by Dorado, each barcode separately: pychopper -k PCS11 -r report.pdf -u unclassified.fq -w rescued.fq input.fq full_length_output.fq

The full-length files were then checked with porechop, using: porechop -i full_length_output.fq -o output_reads.fq The result shows 100% match with barcodes that are used for every libraries and PCR adapter!

For our concern, is our pipeline are in the right track? or are we just misunderstood in some cases?

nrhorner commented 1 month ago

Hi @NHoang98

Yes, the barcodes should be trimmed if the VNP and SSP has been correctly identified and removed. Are you able to post an example affected sequence?

NHoang98 commented 2 weeks ago

Hi @nrhorner,

I am very sorry for the late response.

This is the log file was produced by pychopper A2.1_pychopper.log

And here is the log file from porechop (which was run later then) A2_trimmed.log