Closed vladsavelyev closed 6 years ago
Hmm, actually, manually looking at Longranger's BAM - they do not seem be actually trimming that base :)
AFAIK second mates are not supposed to be trimmed (at least when we were evaluating Long Ranger that was not the case).
Well, I just pointed to their recommendation - The first bp of R2 empirically has about a 5x higher mismatch rate.
Shouldn't make much difference, but just was wondering if you explored that by any chance.
We haven't actually looked at that, but thanks for the reference. In general, we've been trying to be as close to Long Ranger as possible when it comes to stuff like this, which is why we basically just followed their preprocessing formula exactly. In terms of trimming that base, I'd be very surprised if it made any sort of noticeable difference, since it's very unlikely to change any of the candidate alignments (maybe this is why the Long Ranger authors aren't following their own recommendation đ). The only other possibility would be that it may change the initial alignment scores, but even then it seems unlikely that it'd produce any substantial changes (especially since 10x reads intrinsically have higher error rates towards their ends anyway, compared to e.g. standard Illumina reads -- we take this into account in EMA by having a lower clipping penalty). In any case, it's definitely something to keep in mind; maybe we can make the trimming parameters user-specified with Long Ranger's as the default.
Thanks a lot for the detailed answer! That totally makes sense - I agree that there is no reason why this extra read might affect anything. I'd leave it as is. Unless someone else requests this in the future, I don't think there is a need now to make trimming parameters customizable. Especially since LongRanger guys don't seem to stick to their own recommendation đ
As far as I understand, you trim 16+7 based from the first read only, and leave the mate read alone:
The Longranger authors actually recommend additionally trimming the first base of R2: https://community.10xgenomics.com/t5/Genome-Exome-Forum/Best-practices-for-trimming-adapters-when-variant-calling/td-p/470
Have you guys considered that by any chance? Wondering if it would improve speed or quality at all.