epi2me-labs / pychopper

cDNA read preprocessing
Other
61 stars 9 forks source link

The full-length sequences are shorter than the derRNA sequences. #62

Closed biochristmas closed 3 months ago

biochristmas commented 4 months ago

I ran the command to identify full-length transcripts: pychopper -k PCS111 -r CK.report.pdf -u CK.unclassified.fq -w.rescued.fq CK.derRNA.fastq CK.full_length_output.fq. I conducted length statistics on the input sequences (CK.derRNA.fastq), with a minimum length of 500bp. The full-length transcripts (CK.full_length_output.fq) have a minimum length of 50bp. Why are the sequences derived from the input sequence (full-length transcript sequences) shorter instead?

nrhorner commented 4 months ago

Hi @biochristmas

So you are finding that the minimum length of the input reads are 500bp, while that of the processed reads is 50bp?

This is likely due to some of the reads being chimeric and split into multiple, smaller subreads.

nrhorner commented 3 months ago

Closing due to no response