FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
459 stars 149 forks source link

Using TrimGalore for different rounds of adaptor cuttings #148

Open carmencita opened 1 year ago

carmencita commented 1 year ago

Hi!

I am trying to implement the following cutadapt command using TrimGalore:

cutadapt -m 20 -O 20 -a "polyA=A{20}" -a "QUALITY=G{20}" -n 2 fastq_extracted/sample/R1.fastq.gz | 
cutadapt -m 20 -O 3 --nextseq-trim=10  -a "r1adapter=A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=3;max_error_rate=0.100000" - | 
cutadapt -m 20 -O 3 -a "r1polyA=A{18}" - | 
cutadapt -m 20 -O 20 -g "r1adapter=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=20" --discard-trimmed -o fastq_trimmed/sample/R1.fastq.gz - 

to analyse Lexogen data

https://github.com/Lexogen-Tools/quantseqpool_analysis/blob/master/QuantSeqPoolAnalysis.sh

I know that TrimGalore can be used with different adaptor sequences (https://github.com/FelixKrueger/TrimGalore/issues/86). The equivalent TrimGalore command using multiple adaptors would then look as follows:

trimgalore -a " polyA=A{20} -a QUALITY=G{20} -a r1adapter=A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=3;max_error_rate=0.100000 -a r1polyA=A{18} -g r1adapter=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=20 -n 4" --nextseq 10

However, I am unsure how to provide the --discard-trimmed , the changing -O and the --nextseq 10 option as they are executed in different rounds. Should I put them under the TrimGalore -a option? Or is there a workaround you could suggest? For the context: I would like to use cutadapt via TrimGalore because TrimGalore is integrated in the Nextflow RNA-seq pipeline. Thank you for your help.

FelixKrueger commented 1 year ago

Hi @carmencita

This is a tricky question, but I think your solution of sticking everything into the -a option might indeed just work (and it is also probably the only solution as Trim Galore doesn't expose all of the options and functions of Cutadapt for simpilicity reasons).

--nextseq 10 (mutually exclusive with -q 20) and -o ... should be Trim Galore options, so you should be able to specify these on Tower, or via external arguments to the pipeline?

I am not exactly sure why you would want to choose the option --discard-trimmed as it sounds like a fairly harsh call (if it contains either polyA or adapter it will get booted entirely? Would quality trimming affect this as well?

jaanckae commented 6 months ago

Hi

I also have the same issue. Did you manage to solve this already? Putting everything in the -a option does not work for me.

trim_galore -a " polyA=A{20} -a QUALITY=G{20} -a r1adapter=A{18}AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=3;max_error_rate=0.100000 -a r1polyA=A{18} -g r1adapter=AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC;min_overlap=20 -n 4" --nextseq 10 input.fq.gz

results in this error message

Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 2.7
single-core operation.
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

Adapter sequence must contain DNA characters only (A,C,T,G or N)

Thanks in advance

FelixKrueger commented 6 months ago

Which version are you using? My guess is that it may simply be too old (current is 0.6.10).

jaanckae commented 6 months ago

I'm using the one from nf-core, so 0.6.7.