FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

Paired end trimming WGBS #75

Closed demis001 closed 4 years ago

demis001 commented 4 years ago

@FelixKrueger

I am keep getting intermediate file at the end, any idea?

rim_galore -j 4 --paired --retain_unpaired --2colour 20 --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 10 --three_prime_clip_r2 10 --output_dir trimmed_fastq TG6_L001_R1.fq.gz TG6_L001_R2.fq.gz

Output:

TG6_L001_R1_trimmed.fq.gz TG6_L001_R1_unpaired_1.fq.gz TG6_L001_R1_val_1.fq.gz TG6_L001_R2_trimmed.fq.gz TG6_L001_R2_unpaired_2.fq.gz TG6_L001_R2_val_2.fq.gz

This happened for many samples. I checked, no error and the run completed without problem. I am using the current cutadpt and trim_galore.

@demis001

FelixKrueger commented 4 years ago

I'll take a look to see what is going on.

demis001 commented 4 years ago

This is the version I am using

> ./trim_galore --version

                        Quality-/Adapter-/RRBS-/Speciality-Trimming
                                [powered by Cutadapt]
                                  version 0.6.4_dev

                               Last update: 24 09 2019
FelixKrueger commented 4 years ago

Hmm, the same command

trim_galore -j 4 --paired --retain_unpaired --2colour 20 --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 10 --three_prime_clip_r2 10 --output_dir trimmed_fastq smallRNA_100K_R1.fastq.gz smallRNA_100K_R2.fastq.gz

with some local test files results in the following output folder:

$ ls trimmed_fastq

smallRNA_100K_R1.fastq.gz_trimming_report.txt
smallRNA_100K_R1_unpaired_1.fq.gz
smallRNA_100K_R1_val_1.fq.gz
smallRNA_100K_R2.fastq.gz_trimming_report.txt
smallRNA_100K_R2_unpaired_2.fq.gz
smallRNA_100K_R2_val_2.fq.gz

So it all seems to work well over here. I am not really sure why this happening at your end. My version is: 0.6.5

demis001 commented 4 years ago

I run over 100 samples, this happens for 20 of them. I am getting the correct output for the rest 80 samples. I got low alignment efficiency and checked back and that is what happened. I will try the older versions.

FelixKrueger commented 4 years ago

Maybe it had to do with the parallel processing, and file synchronization issues? I would probably first try out the same command but dropping the -j 4 (and upgrading to the latest version).

demis001 commented 4 years ago

I also suspected that and running without it for one of the failed sample right now.

I also merged the data from two lanes before doing that, do you think that will create a problem?

cat XXX_6008191213A6/TG6_L00[12]_R1_001.fastq.gz XXXX191220B6/TG6_L00[12]_R1_001.fastq.gz

XXX_6008_merged/TG6_L001_R1.fq.gz

On Wed, Jan 15, 2020 at 11:54 AM Felix Krueger notifications@github.com wrote:

Maybe it had to do with the parallel processing, and file synchronization issues? I would probably first try out the same command but dropping the -j 4 (and upgrading to the latest version).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FelixKrueger/TrimGalore/issues/75?email_source=notifications&email_token=ACCPKKSA5MXX6GZ7LPNNASLQ545WFA5CNFSM4KHGJGJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBAOOY#issuecomment-574752571, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCPKKXWXXUXXRLHK5OQO7DQ545WFANCNFSM4KHGJGJA .

FelixKrueger commented 4 years ago

I am not sure if this could have do with it. I remember from some time ago (specifically for FastQC processing of merged FastQ files) that under certain circumstances

cat *_L00[12]_R1_001.fastq.gz > merged.fastq.gz

was not equivalent to:

zcat *_L00[12]_R1_001.fastq.gz | gzip -c - > merged_fastq.gz

It had something to do with (invisible) headers in the files (this might be googlable).

demis001 commented 4 years ago

I see this in one of the log file? What does that mean, what is the possible way to fix it.

Read 2 output is truncated at sequence count: 57529710, please check your paired-end input files! Terminating...

FelixKrueger commented 4 years ago

Ah there we go, some files did not have the same number of sequences.... any chance it has do do with the merging somehow?

demis001 commented 4 years ago

What happened was, the sequencing core sent me something like this R1 R2 R2 then repeated R1 R2 R1 R2 in second lane. I didn't want to throw some of it.

demis001 commented 4 years ago

In lane 1, I have R1 In lane 2, I have R1, R2

I then merged R1 from lane 1 and lane 2, used R2 from lane 2. The size of R1 is bigger than R2

demis001 commented 4 years ago

Thanks, found the problem! It is created with unequal number of R1 and R2.

@demis001

FelixKrueger commented 4 years ago

OK good, I am hopeful that you can get that sorted. Closing this issue if that's OK