Paired end trimming WGBS

demis001 commented 4 years ago

@FelixKrueger

I am keep getting intermediate file at the end, any idea?

rim_galore -j 4 --paired --retain_unpaired --2colour 20 --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 10 --three_prime_clip_r2 10 --output_dir trimmed_fastq TG6_L001_R1.fq.gz TG6_L001_R2.fq.gz

Output:

TG6_L001_R1_trimmed.fq.gz TG6_L001_R1_unpaired_1.fq.gz TG6_L001_R1_val_1.fq.gz TG6_L001_R2_trimmed.fq.gz TG6_L001_R2_unpaired_2.fq.gz TG6_L001_R2_val_2.fq.gz

This happened for many samples. I checked, no error and the run completed without problem. I am using the current cutadpt and trim_galore.

@demis001

FelixKrueger commented 4 years ago

I'll take a look to see what is going on.

demis001 commented 4 years ago

This is the version I am using

> ./trim_galore --version

                        Quality-/Adapter-/RRBS-/Speciality-Trimming
                                [powered by Cutadapt]
                                  version 0.6.4_dev

                               Last update: 24 09 2019

FelixKrueger commented 4 years ago

Hmm, the same command

trim_galore -j 4 --paired --retain_unpaired --2colour 20 --clip_r1 6 --clip_r2 6 --three_prime_clip_r1 10 --three_prime_clip_r2 10 --output_dir trimmed_fastq smallRNA_100K_R1.fastq.gz smallRNA_100K_R2.fastq.gz

with some local test files results in the following output folder:

$ ls trimmed_fastq

smallRNA_100K_R1.fastq.gz_trimming_report.txt
smallRNA_100K_R1_unpaired_1.fq.gz
smallRNA_100K_R1_val_1.fq.gz
smallRNA_100K_R2.fastq.gz_trimming_report.txt
smallRNA_100K_R2_unpaired_2.fq.gz
smallRNA_100K_R2_val_2.fq.gz

So it all seems to work well over here. I am not really sure why this happening at your end. My version is: 0.6.5

demis001 commented 4 years ago

I run over 100 samples, this happens for 20 of them. I am getting the correct output for the rest 80 samples. I got low alignment efficiency and checked back and that is what happened. I will try the older versions.

FelixKrueger commented 4 years ago

Maybe it had to do with the parallel processing, and file synchronization issues? I would probably first try out the same command but dropping the -j 4 (and upgrading to the latest version).

demis001 commented 4 years ago

I also suspected that and running without it for one of the failed sample right now.

I also merged the data from two lanes before doing that, do you think that will create a problem?

cat XXX_6008191213A6/TG6_L00[12]_R1_001.fastq.gz XXXX191220B6/TG6_L00[12]_R1_001.fastq.gz

XXX_6008_merged/TG6_L001_R1.fq.gz

On Wed, Jan 15, 2020 at 11:54 AM Felix Krueger notifications@github.com wrote:

Maybe it had to do with the parallel processing, and file synchronization issues? I would probably first try out the same command but dropping the -j 4 (and upgrading to the latest version).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FelixKrueger/TrimGalore/issues/75?email_source=notifications&email_token=ACCPKKSA5MXX6GZ7LPNNASLQ545WFA5CNFSM4KHGJGJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBAOOY#issuecomment-574752571, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCPKKXWXXUXXRLHK5OQO7DQ545WFANCNFSM4KHGJGJA .

FelixKrueger commented 4 years ago

I am not sure if this could have do with it. I remember from some time ago (specifically for FastQC processing of merged FastQ files) that under certain circumstances

cat *_L00[12]_R1_001.fastq.gz > merged.fastq.gz

was not equivalent to:

zcat *_L00[12]_R1_001.fastq.gz | gzip -c - > merged_fastq.gz

It had something to do with (invisible) headers in the files (this might be googlable).

demis001 commented 4 years ago

I see this in one of the log file? What does that mean, what is the possible way to fix it.

Read 2 output is truncated at sequence count: 57529710, please check your paired-end input files! Terminating...

FelixKrueger commented 4 years ago

Ah there we go, some files did not have the same number of sequences.... any chance it has do do with the merging somehow?

demis001 commented 4 years ago

What happened was, the sequencing core sent me something like this R1 R2 R2 then repeated R1 R2 R1 R2 in second lane. I didn't want to throw some of it.

demis001 commented 4 years ago

In lane 1, I have R1 In lane 2, I have R1, R2

I then merged R1 from lane 1 and lane 2, used R2 from lane 2. The size of R1 is bigger than R2

demis001 commented 4 years ago

Thanks, found the problem! It is created with unequal number of R1 and R2.

@demis001

FelixKrueger commented 4 years ago

OK good, I am hopeful that you can get that sorted. Closing this issue if that's OK

FelixKrueger / TrimGalore

Paired end trimming WGBS #75