OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.93k stars 334 forks source link

Warning label: different read numbers in pack #378

Open bamorim-bio opened 2 years ago

bamorim-bio commented 2 years ago

Hi I was hoping you could clarify a warning label that I got:

fastp --in1 /Users/beatrizamorim/Desktop/mtDNA/data/DEMI115_R1.fastq.gz \ --in2 /Users/beatrizamorim/Desktop/mtDNA/data/DEMI115_R2.fastq.gz \ --detect_adapter_for_pe \ -c \ -p \ --qualified_quality_phred 30 \ --dedup --out1 DEMI115_R1.qc.fq.gz \ --out2 DEMI115_R2.qc.fq.gz \ --unpaired1 singletons.DEMI115.qc.fq.gz \ --unpaired2 singletons.DEMI115.qc.fq.gz \ --failed_out failed.DEMI115.qc.fq.gz \ --json fastp.DEMI115.json \ --html fastp.DEMI115.html \ --thread 4

Detecting adapter sequence for read1... No adapter detected for read1

Detecting adapter sequence for read2...

Illumina TruSeq Adapter Read 2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

WARNNIG: different read numbers of the 932 pack Read1 pack size: 1000 Read2 pack size: 91

This is the first time I get an error like this, all the samples before this one didnt have any problem. The program got stuck on this error.

Does this mean there is something wrong with my reads? Should I exclude this sample altogether?

i am using version: 0.23.1

georgia-katsoula commented 2 years ago

Hi @bamorim-bio I have the same warning. Did you figure this out?

sfchen commented 2 years ago

Can you constantly reproduce this issue when you rerun the command?

georgia-katsoula commented 2 years ago

Thank you for the quick response. Yes I get this error for one of my samples (reran it 3 times) and the process get stuck. My command looks like that - it a part of Snakemake file-:

fastp \
          -i {input.fq1} \
          -o {output.trimmed_1} \
          -I {input.fq2} \
          -O {output.trimmed_2} \
          --unpaired1 {output.unpaired_1} \
          --unpaired2 {output.unpaired_2} \
          --failed_out {output.failed} \
          --detect_adapter_for_pe \
          --overrepresentation_analysis \
          --qualified_quality_phred 30 \
          --html {output.report_html} \
          --json {output.report_json} 2>&1 > {log}

Output:

Detecting adapter sequence for read1...
No adapter detected for read1

Detecting adapter sequence for read2...
CTCATTTACACCAACCACCCAACTATCTATAAACCTAGCCATGGCCATCCCCTTATGAGC

WARNNIG: different read numbers of the 22739 pack
Read1 pack size: 173
Read2 pack size: 1000
bamorim-bio commented 2 years ago

Hi @bamorim-bio I have the same warning. Did you figure this out?

Hi! I did. So I figure that this error occurred with samples that had different number of sequences in the different reads. So for example, I had sample A that had 1.7M seqs of reads 1 and 1M of reads 2. I had to find a way to fix this as the error kept persisting with my downstream analysis (even if I found other software like fastp that could run with these unequal lengths, when aligning with BWA I had also had errors).

What ended up working for me was repairing reads with bbtools

repair.sh -Xmx14g in1=SampleA_R1.fastq.gz in2=SampleA_R2.fastq.gz out1=SampleA_R1_repaired.fastq.gz out2=SampleA_R2_repaired.fastq.gz outs=/SampleA_single.fastq.gz repair

Afterwards, fastp worked fine!

georgia-katsoula commented 2 years ago

Thank you so much @bamorim-bio for taking the time! I will try this out. :)

bamorim-bio commented 2 years ago

Thank you so much @bamorim-bio for taking the time! I will try this out. :)

Let me know if you need help or if that didn't work for you!

I also didn't mention but I saw that the reads had different numbers of sequences while doing QC with FastQC!

my email is bamorim@cibio.up.pt :)

LvLH commented 2 years ago

Hi, I got this same problem, and fastp runs in the background all the time and doesn't stop with status "S". Info in log file: WARNNIG: different read numbers of the 30908 pack Read1 pack size: 224 Read2 pack size: 1000

jessicarowell commented 2 years ago

I think fastp code should be altered, if possible, to catch the cause of this error and make the program exit gracefully instead of hanging indefinitely.

fastp -i /home/input/sample_1.fq.gz -q 20 -l 50 -o qc/sample_1.fq.gz -I /home/input/sample_2.fq.gz -O qc/sample_2.fq.gz --json qc/fastp.json --html qc/fastp.html --disable_adapter_trimming --failed_out qc/fail.fq.gz

Result after 15 hrs (I forgot and left it to run overnight): WARNNIG: different read numbers of the 4614 pack Read1 pack size: 169 Read2 pack size: 1000

vinisalazar commented 1 year ago

Hi @bamorim-bio, thanks for posting the BBTools fix. I still get the "different read numbers" warning, but fastp now runs to the end!

vinisalazar commented 1 year ago

@sfchen thank you for providing fastp. It's amazing.

Would you have an estimate as to when you could incorporate these fixes and release a new version? I'm sure it would benefit many users.

Best, V