OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.86k stars 332 forks source link

shouldn't poly A trimming happen after adaptor trimming? #132

Closed ChenfuShi closed 5 years ago

ChenfuShi commented 5 years ago

I'm working with 3' RNA-seq with polyT priming. in the resulting reads I still get poly A tails. I think this is happening because the adaptor is trimmed after the polyA trimming. shouldn't it happen in the other order?

sfchen commented 5 years ago

What's your command? Did you enable polyX trimming by --trim_poly_x

whhyh1314 commented 5 years ago

It really has this problem because the adaptor is trimmed after the polyA trimming according to the source code. You can change the order by editing the "peprocessor.cpp" source code or the author give an option to select the order after updating the source code.

sfchen commented 5 years ago

Currently the order is:

1, UMI preprocessing (--umi)
2, global trimming at front (--trim_front)
3, global trimming at tail (--trim_tail)
4, quality pruning at 5' (--cut_front)
5, quality pruning by sliding window (--cut_right)
6, quality pruning at 3' (--cut_tail)
7, trim polyG (--trim_poly_g, enabled by default for NovaSeq/NextSeq data)
8, trim polyX (--trim_poly_x)
9, trim adapter by overlap analysis (enabled by default for PE data)
10, trim adapter by adapter sequence (--adapter_sequence, --adapter_sequence_r2. For PE data, this step is skipped if last step succeeded)
11, trim to max length (---max_len)

As @whhyh1314 pointed out, I may add options to change the orders. But before that you can change the order in the source.

sfchen commented 5 years ago

Probably putting polyX trimming after adapter trimming is a better solution.

whhyh1314 commented 5 years ago

Yes, adding options to change the different steps' orders is a perfect choice for users.

sfchen commented 5 years ago

After consideration, I think polyX trimming after adapter trimming is better.

I submitted a commit to change the order, please try the latest 0.19.7 and update here.

ChenfuShi commented 5 years ago

Thanks a lot!

sfchen commented 5 years ago

Could you please upload a poly-A data for me to test?

sklages commented 5 years ago

hmm, .. okay. But is there any reason why "global trimming" (--trim_front / --trim_tail) takes place before all other trimming steps?

E.g. I need this this option to trim off 10 bases from the final, clean product (after bad quality, illumina adaptor and polyA/G have been removed). It makes IMHO no sense to hard-clip data in a "more-than-one-thing-to-trim" workflow before the sequence/pattern-based trimming steps.

So I would need to run fastp <quality/Illumina/polyAG params> | fastp <hard_clip params>. Maybe you could provide a user-controlled parameter to specify when to run hard clipping (--trim_front/tail)? Before or after sequence/pattern-based trimming steps.

sfchen commented 5 years ago

global trimming is used to fix sequencer artifacts, for example --trim_tail=1 to cut the 151 cycle for 150 PE sequencing.

mdtorohernando commented 7 months ago

Hi! I'm trying to cut polyA tails with FastP and I've run twice changing parameter order.

fastp -i S8_L001_R1_001.fastq.gz -I S8_L001_R2_001.fastq.gz -o out1.clean.fq.gz -O out2.clean.fq.gz --trim_poly_g --trim_poly_x --detect_adapter_for_pe --cut_front 25 --cut_tail 25 --cut_mean_quality 25 -l 100 -h report_fastp.html and after reading this issue in the following mode (polyT and polyX triming after adapter trimming):

fastp -i S8_L001_R1_001.fastq.gz -I S8_L001_R2_001.fastq.gz -o out1.clean.fq.gz -O out2.clean.fq.gz --cut_front 25 --cut_tail 25 --cut_mean_quality 25 --detect_adapter_for_pe --trim_poly_g --trim_poly_x -l 100 -h report_fastp.html However, it doesn't seem to work, as it remains polyA at the end of the files... (MultiQC report attached). Captura de pantalla 2024-01-31 a las 22 27 04

FastP v0.23.4

Some suggestions??? Thanks!

Poocee commented 1 month ago

I got the same problem.