FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

Read length vary for sequence in PE fastq files #98

Closed priyatamapandey closed 4 years ago

priyatamapandey commented 4 years ago

Hi, I ran the trimGalore on my paired end DNA seq. While running STAR aligner tool on these trimmed files ended up with the error that fix your fastq files. I looked and found that the reads are not the same length in both the files. Attached is an example of the read sequence which I checked.

Screen Shot 2020-07-25 at 5 23 28 PM

I think we don't expect two different length in for the same reads in the paired end files. Please help me in this regard how I fix this error. Thanks, Priya

FelixKrueger commented 4 years ago

Hi Priya,

Trim Galore should not produce files of different length. Indeed, if it encounters files of different length it should throw an error and die. Which version of Trim Galore were you using, what was the exact command and did you see any errors?

Can you check the length of the input files? e.g. with zcat SRR102...fastq.gz | wc -l, and do this for R1 and R2. Cheers, Felix

priyatamapandey commented 4 years ago

Hi Felix, Thank you for your quick reply. I am using the latest version of the TrimGalore-0.6.5. The command I have used is as follow trim_galore \ --illumina \ --paired \ --fastqc \ -o $outfile \ $fastq1 $fastq2

My DNA-seq is from the completeGenomics platform. I think it don't have adapters, not very sure. I was little skeptical to give illumina option. What do you think about command?

Here is the screenshot of length.

Screen Shot 2020-07-27 at 8 10 17 AM

Also, I have looked the logfile for this PE file(attaching it here). I found Trim Galore version: 0.6.4_dev. I hope both version are same.

trimGalore_7788089_2.errcd.zip

FelixKrueger commented 4 years ago

Oh, I think I might have misread your initial email... Are you saying that the files contain the same number of reads, but that the reads are not of the same length (Does FastQC report the same number of reads for both trimmed (_val_1 and _val2) files? It that a problem for STAR? But yea, if you subject a read to adapter and quality trimming - of course the read length will not be the same for the reads in both files, this is the idea about trimming... (by the way I am not sure that reads that are only 35bp long need to be trimmed anyway as adapter contamination if probably very rare indeed for reads that short....).

priyatamapandey commented 4 years ago

Hi Felix, Yes, FastQC report the same number of reads for both trimmed files. I have checked that for most for the files trimmed.fastq files get automatically deleted after generating the final trimmed files (val_1 and val_2) but for few it is still there. There are some error in that I should rerun it. Since my files are WGS file. These are pretty big in size trimGalore takes lot of time. Can I rerun trimGalore on them where it stop or from the point where problem occurred?

Thank you, Priya

FelixKrueger commented 4 years ago

I am afraid Trim Galore has no functionality to resume failed runs unfortunately.... But if you have to re-do the splitting I would recommend you use -j 4 to cut the time down to a fraction (this requires Python3 and pigz installed though). Best, Felix

priyatamapandey commented 4 years ago

Hi Felix, Thank you for your help. I am running trimGalore on HPC. Any reason for picking -j 4? Or this number can be increase based on number of cores availability?

Best, Priya University of Southern California Keck School of Medicine

On Fri, Jul 31, 2020 at 1:02 PM Felix Krueger notifications@github.com wrote:

I am afraid Trim Galore has no functionality to resume failed runs unfortunately.... But if you have to re-do the splitting I would recommend you use -j 4 to cut the time down to a fraction (this requires Python3 and pigz installed though). Best, Felix

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/FelixKrueger/TrimGalore/issues/98#issuecomment-667326486, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJIFCHIKGPUYVFPRFZSRLDR6MPMVANCNFSM4PIFH27Q .

FelixKrueger commented 4 years ago

Hi Priya,

About the installation of pigz you would need yo ask your HPC people...

Regarding the number of cores, we found that -j 4 seems to be a kind of sweet spot, after that you seem to get diminishing returns (compare the release notes here: https://github.com/FelixKrueger/TrimGalore/releases/tag/0.6.0)

priyatamapandey commented 4 years ago

Hi Felix, Thank you for introducing this. I already set -j 12 and now noticed your comment that will get diminishing returns. This is in sense of time not my file data returns. Actually I put this in my pipeline where I have given 12 cores for mapping and the parallel tools so trimGalore also picked 12 cores. Please confirm.

Thank you, Priya

FelixKrueger commented 4 years ago

Yes, the diminishing returns here means that you will not get a linear increase in speed anymore, but it shouldn't have any impact on the data as such

priyatamapandey commented 4 years ago

Thank you for your quick reply and introducing me this thread option. It made my work faster than before.

Best, Priya