epi2me-labs / wf-single-cell

Other
74 stars 39 forks source link

Recommendation for read filtering and trimming #107

Closed HenriettaHolze closed 5 months ago

HenriettaHolze commented 5 months ago

Ask away!

Hi, I'm wondering what your recommendation for read trimming and filtering is.
The nf-core scnanoseq pipeline by default filters reads with a minimum average read quality score of 10 and minimum length of 500bp. I also observed in my data that the bases at the end of a read can have very low phred scores (error probability 50%).
Would you recommend filtering low quality reads and trim low quality bases at the end of a read prior to running the pipeline? Does it have an effect on read splitting and mapping?
The read quality score (I assume average phred score) doesn't go below 10 in my data based on the pipeline report, so maybe you already apply some filtering?
image

nrhorner commented 5 months ago

Hi @HenriettaHolze There is currently no quality filtering done on the input reads. That's possibly something we should add. It does look like your data has been pre-filtered to a read quality score of 10.

I would not trim the reads. If the quality at the end of the reads is that bad, then it would be unlikely that a valid adapter sequence or barcode would be able to be identified. These reads would then not be processed by the workflow.

nrhorner commented 5 months ago

Closing due to lack of response