OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.95k stars 334 forks source link

Very short fragments #540

Open SilasK opened 10 months ago

SilasK commented 10 months ago

I have a small issue, I run fastp with the command line below.

It detects high duplication rate (30%) also some of the reads seem very short as seen in the insert size hist.

The base histograms shows some weird fluctuation at the beginning, which I take to be primer dimers.

My question is if there are short inserts that overlap, the reads cannot be longer than the insert, and should be remove, isn't it?

fastp version: 0.23.4 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 149bp, 149bp
mean length after filtering: 148bp, 148bp
duplication rate: 62.784168%
Insert size peak: 31
total reads: 36.535150 M
total bases: 5.447834 G
Q20 bases: 5.313695 G (97.537758%)
Q30 bases: 5.074262 G (93.142749%)
GC content: 49.901514%
reads passed filters: 36.275760 M (99.290026%)
reads corrected: 920 (0.002518%)
bases corrected: 1.166000 K (0.000021%)
reads with low quality: 0 (0.000000%)
reads with too many N: 342 (0.000936%)
reads too short: 258.562000 K (0.707708%)
reads with low complexity: 486 (0.001330%)
image image
fastp --thread 8 --qualified_quality_phred 8 --length_required 75 --low_complexity_filter --detect_adapter_for_pe --correction --cut_tail --dup_calc_accuracy 5 --cut_mean_quality 15 --dedup --in1 Sample_R1.fastq.gz --in2 Sample_R1.fastq.gz --out1 qc_sample_R1.fastq.gz --out2 qc_sample_R2.fastq.gz --json report.json --html report.html