Closed pmenzel closed 2 years ago
Hi @pmenzel
I can see that this might be occasionally useful, however:
--hardtrim5
is supposed to generate a new file with a defined sequence length. If you combine hard-trimming with adapter/quality trimming, a defined sequence length can now longer be guaranteed--three_prime_clip_r[12]
), even though this can be difficult to 'get right', again because of the variability of the adapter and quality process--length
) and maximum (--max_length
) read length. Don't you think you could select a combination of these options to suit your needs, or at least get you very close to what you had in mind?Thanks for looking into it!
* `--hardtrim5` is supposed to generate a new file with a defined sequence length. If you combine hard-trimming with adapter/quality trimming, a defined sequence length can now longer be guaranteed
You mean guaranteed in the sense that all reads are exactly of length N? But that would also depend on the input file, which might have reads shorter than N in the first place. But I can understand that it is treated as a separate step in Trim Galore
* Trim Galore has the options to trim sequences from the 3' end (`--three_prime_clip_r[12]`), even though this can be difficult to 'get right', again because of the variability of the adapter and quality process * Trim Galore also has the options to select sequences/ sequence pairs based on a minimum (`--length`) and maximum (`--max_length`) read length. Don't you think you could select a combination of these options to suit your needs, or at least get you very close to what you had in mind?
Unfortunately, these options are not what --hardtrim5
is doing.
What I have specifically in mind is removing that last extra cycle in Illumina FASTQ files (e.g. cycle 151, 251, etc.), which often has bad quality or miscalled G
s (NextSeq), and should not be included in downstream analysis.
I can see your point also, I'm just trying to get away with the options that are already there.... :)
One more try: If you would select --three_prime_clip_r1 1 --three_prime_clip_r2 1
, this would guarantee to take of 1 bp from the 3' end, wouldn't it? The downside would of course be that if a sequence had already been trimmed by adapter or quality trimming, you would lose one additional bp for these reads. Not ideal, but maybe tolerable for reads that long?
Yeah, I saw that option too, and as you said, one would always loose one base.
My specific dataset is an amplicon panel, in which the R1 and R2 reads of some amplicons barely overlap, so that is a case where every base counts. :)
Anyways, I just can run trim_galore twice and get the desired result.
Was just wondering if there would be a way to run it only once.
Ok, if that works for you that would save me from implementing another option. If it was your only chance to get what you need I might be persuaded to take another look, but it seems that you seem happy enough to go with what we have right now. Cheers, Felix
Hi,
would it be possible to run the default adapter/quality trimming followed immediately by hard trimming to a maximum length (option
--hardtrim5
) in one go? So that the output files would be the same_val_*.fq(.gz)
as for normal trimming.best wishes!