Default vs Specified Adapter

ionox0 commented 6 years ago

Hi Felix!

This is not an issue but rather a question coming from a comparison we've done. I ran trimgalore 2x, once with specifying the full adapter sequences that we use, and again without specifying anything, which defaults to the 13bp illumina adapter sequence.

We noticed that there was a very small increase in the number of trimmed sequences, and thus a very small decrease in coverage. Does this make sense given the way the tool performs trimming? Would a shorter adapter be likely to result in more adapter trimmed given that the longer version had the same initial 13 bases?

Thank you for your help in developing this tool!

FelixKrueger commented 6 years ago

Hi ionox0,

I think my answer to this is probably that I don't know exactly what the differences would be as I have never looked into this in more detail. I assume the differences will ultimately come down to the allowed errors when you specify a longer a sequence. For the default 13bp sequence it is still fairly easy: if a part of it is found right at the 3' end of a read that sequences will get removed. If the 13bp sequence is found within a read, and there may be a 1bp mismatch since the default error rate is 0.1, so 1bp in a 13bp long sequence, then everything from the AGATC... onwards will be removed.

If you now specify a say 60bp long adapter sequence, you would allow up to 6bp of mismatch within the read, however if you for whatever reason see more mismatches in the sequence than that, the read will not get trimmed, and probably be kicked out in the alignment step later on. So I would imagine that the 13bp trimming may initially appear somewhat stricter in terms of trimming, but it might even allow more sequences to pass the mapping step later on. But as I said, I have never really tried very hard to figure out what the exact differences are. My gut feeling is that it will largely not matter much though (and we personally have never used a long adapter sequence for trimming over here, ever). Hope this helps?

ionox0 commented 6 years ago

Yes, thank you Felix, my guess at this point is that the extra bases fall in a region closer to the end of the read, which has a higher error rate. Thus they are trimmed more often, which results in (slightly) lower coverage because of supplying the full adapter.

ionox0 commented 6 years ago

Correction - "less often", "higher coverage" when supplying full adapter sequence

Thanks very much!

FelixKrueger / TrimGalore

Default vs Specified Adapter #32