FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
461 stars 150 forks source link

TrimGalore version 0.6.6 thinks Cutadapt version 2.11 is <1.0 #105

Closed mariadelmarq closed 3 years ago

mariadelmarq commented 3 years ago

Hi,

We've installed Trim Galore and Cutadapt on a singularity container, and we're getting some strange errors when running it on publicly available files (https://www.ebi.ac.uk/ena/browser/view/PRJNA277916). I believe you've added some checks to abort if the version of Cutadapt is too old (<1.0), which is getting triggered in error for us? Any tips? Log below:

Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 2.11.dev16+gdfd7273
Cutadapt seems to be using Python 3! Proceeding with multi-core enabled Cutadapt using 6 cores
Proceeding with 'gzip' for compression. PLEASE NOTE: Using multi-cores for trimming with 'gzip' only has only very limited effect! (see here: https://
github.com/FelixKrueger/TrimGalore/issues/16#issuecomment-458557103)
To increase performance, please install 'pigz' and run again

Using user-specified basename (>>SRR2068095<<) instead of deriving the filename from the input file(s)

AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> SRR2068095_1.fastq.gz <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence        Sequences analysed      Percentage
Illumina        3740    AGATCGGAAGAGC   1000000 0.37
smallRNA        2       TGGAATTCTCGG    1000000 0.00
Nextera 1       CTGTCTCTTATA    1000000 0.00
Using Illumina adapter for trimming (count: 3740). Second best hit was smallRNA (count: 2)

Writing report to 'SRR2068095_1.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: SRR2068095_1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.6
Cutadapt version: 2.11.dev16+gdfd7273
Python version: 3.6.9
Number of cores used for trimming: 6
Quality Phred score cutoff: 30
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 50 bp
All Read 1 sequences will be trimmed by 10 bp from their 5' end to avoid poor qualities or biases
All Read 2 sequences will be trimmed by 10 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications)
All Read 1 sequences will be trimmed by 5 bp from their 3' end to avoid poor qualities or biases
All Read 2 sequences will be trimmed by 5 bp from their 3' end to avoid poor qualities or biases
Output file(s) will be GZIP compressed

Use of uninitialized value $major_version in numeric eq (==) at /local_build/bin/trim_galore line 794.
Use of uninitialized value $major_version in numeric gt (>) at /local_build/bin/trim_galore line 811.
Cutadapt major version was not 1 or higher. Simply too old...
FelixKrueger commented 3 years ago

Hi Maria,

Thanks for the report. Looking at the regex I agree that a version 2.11.dev16+gdfd7273would not have been captured. I have therefore changed the regular expression extracting the version to /(\d+\.\d+)\..+/, I hope it'll do the trick.

Can you please clone the current development version (e.g. git clone https://github.com/FelixKrueger/TrimGalore.git) and see if that works in your hands? And while you are at it, I would also suggest you installed pigz on the singularity container to be able to make use the parallelisation.

Let me know how you get on.

mariadelmarq commented 3 years ago

Hi @FelixKrueger, thanks heaps for the quick response!

I reverted back to version 2.10 of cutadapt (I didn't realise we were using a development version, instead of the latest stable one), but I'll try to make some time to check whether your latest development version works fine with cutadapt's latest development version.

Thank you also for the pigz suggestion, am testing it right now 👍

apposada commented 3 years ago

Hi, I am having a similar issue (using TrimGalore from bioconda). Below is the output of my command. I am aware that there are some warnings with the perl locale. In the beginning I was not overly concerned about those since trimgalore was running fine even with those warnings; however now it stops with a similar error.

TrimGalore is version 0.6.6.

Thanks in advance!

Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
/bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
3.4
single-core operation.
Output will be written into the directory: /home/aperpos/test/v02_ok/04_trimgalore/trimmed_reads/

AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> ../03_python_filter_uncorrected/unfixrm_r1.cor.fq <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence        Sequences analysed      Percentage
Illumina        63      AGATCGGAAGAGC   51072   0.12
Nextera 1       CTGTCTCTTATA    51072   0.00
smallRNA        0       TGGAATTCTCGG    51072   0.00
Using Illumina adapter for trimming (count: 63). Second best hit was Nextera (count: 1)

Writing report to '/home/aperpos/test/v02_ok/04_trimgalore/trimmed_reads/unfixrm_r1.cor.fq_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: ../03_python_filter_uncorrected/unfixrm_r1.cor.fq
Trimming mode: paired-end
Trim Galore version: 0.6.6
Cutadapt version: /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
3.4
Number of cores used for trimming: 1
Quality Phred score cutoff: 5
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 36 bp
Length cut-off for read 1: 35 bp (default)
Length cut-off for read 2: 35 bb (default)

Use of uninitialized value $major_version in numeric eq (==) at /home/aperpos/programs/miniconda3/envs/trinity_venv/bin/trim_galore line 794.
Use of uninitialized value $major_version in numeric gt (>) at /home/aperpos/programs/miniconda3/envs/trinity_venv/bin/trim_galore line 811.
Cutadapt major version was not 1 or higher. Simply too old...
FelixKrueger commented 3 years ago

Hmm, this seems to have to do setting the locale: LC_ALL ... I'm not exactly sure why you are seeing this error, can you maybe ask a friendly local sysadmin to fix this for you?

If you are happy that Cutadapt is working on your system, a quick (and rather dirty) hack would be to simply go to the line: https://github.com/FelixKrueger/TrimGalore/blob/e9b8fd847f4da01fa3b886d134bc2ecd447a8068/trim_galore#L791 in Trim Galore, and change it to:

my ($major_version,$sub_version) = (3,4);

But I think it would be good to fix this locale issue, maybe also talk to the guys at bioconda? Sorry I can't be of more help...

apposada commented 3 years ago

Thanks for the quick reply Felix. I am asking the sysadmins of our setup and I also think it has to do with the perl warning, because when I run TrimGalore in a machine that does not have these warnings, everything goes fine.

I agree it would be better to fix the locale problems. Thanks a lot!

FelixKrueger commented 3 years ago

Good luck!