Closed FrancisBlokzijl closed 5 years ago
Hi Francis,
To 1: I'm not sure if allowing --nextseq-trim
would work out of the box, but I could look into it. My guess is that poly-G reads won't matter much in your data as low complexity sequences tend not to match at all, so they will get kicked out at a mapping step anyway.
To 2: The 5' and 3' end trimming is actually carried out by Trim Galore and not Cutadapt itself, so the Cutadapt command will indeed not have changed between the two conditions.
PS: I am currently at a hackathon event so I won't have a lot of time to look into 1) for a while I am afraid.
Best, Felix
Hi Felix,
Thanks for your quick response!
1) Yes I agree, the reads that are largely polyG's won't map anyway. But I'm afraid that I will have reads that are not quality trimmed properly due to a few high quality G's at the end of the read, and therefore won't map very well. Like was described in this blog by Simon Andrews. Because I have single cell data, I want to keep as much data as possible. So if the quality trimming ignores the quality of the G's, such as is done with --nextseq-trim, I will save some good reads hopefully.
2) Okay, that explains it. Just out of curiosity, why don't you use cutadapt --cut for this?
Thanks a lot and enjoy the hackathon event!
Best, Francis
Hi Francis,
1) That makes sense. I just had a go at enabling this option (only for standard trimming and not RRBS or PolyA trimming yet), and from what I can tell it appears to do the job. If you can just clone the development version, and give it a whirl? The new option is called --2colour/--nextseq INT
, whereby INT selects the quality cutoff that is normally set with -q
, only that qualities of G bases are ignored. Would you mind giving some feedback whether or not this is what you were looking for?
2) The answer to that is probably that it has historic reasons (when we started writing Trim Galore some 8 years ago many options were simply not available). Also, since people seem to invent tirelessly new ways to 'screw up' their data, having some flexibility to decide what to do with data really helps.
All the best, Felix
Hi Felix!
Yes it works! Thanks a lot! I'm sure other people will appreciate this option as well!
There is no develop branch though, it was in the master.
Some ideas for finetuning this feature: 1) In the trimming report it says: 0 bp quality trimmed. Would be good to include a counter for this quality trimming with ignoring the G quality in the report as well. 2) Maybe only allow setting either -q or --nextseq, or give a warning, such that people know what they are doing. The cutadapt command now includes both -q 20 and --nextseq-trim=20. If the nextseq threshold is the same as the default for quality trimming (20), this doesn't affect the result. But if people set the nextseq trim lower than the default of normal quality trimming, they might not realize that there is a more stringent quality trimming without ignoring Gs performed as well... Or maybe better: if nextseq cutoff is given, then normal quality trimming is not performed, but if people supply both q and nextseq then it's fine. Do I make any sense?
Thanks! Francis
I'll try to put that in next week, thanks for the feedback!
Thanks a lot!
Dear Felix,
Thanks for making this awesome wrapper! I have two questions:
1) Is it also possible to use the cutadapt --nextseq-trim option from cutadapt? That would be great as I clearly have poly-Gs in my reads!
2) It seems that the cutadapt command that is printed in the trimming_report.txt file does not contain all arguments. For example: I did one run trimgalore run with "--clip_R1 6 --clip_R2 6" and one run without this clipping. It does say:
All Read 1 sequences will be trimmed by 6 bp from their 5' end to avoid poor qualities or biases All Read 2 sequences will be trimmed by 6 bp from their 5' end to avoid poor qualities or biases (e.g. M-bias for BS-Seq applications)
in the trimming report, but the cutadapt arguments are the same as the run without the clipping: This is cutadapt 1.18 with Python 3.6.6 Command line parameters: -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC demultiplexedR2.fastq.gz
I expected the cutadapt command to include -u/--cut 6, am I missing something?
Thanks a lot for your help in advance!
Best, Francis