FelixKrueger / TrimGalore

A wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data
GNU General Public License v3.0
459 stars 149 forks source link

Change processing strategy for `--clock` option #183

Closed FelixKrueger closed 6 months ago

FelixKrueger commented 6 months ago

In it's current implementation, the dual-UMI RRBS reads come in the following format:

Read 1  5' UUUUUUUU CAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF TACTG UUUUUUUU 3'
Read 2  3' UUUUUUUU GTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ATGAC UUUUUUUU 5'

The clock processing removes 8 bp of UMI (transferred to readID), and clips the following 5bp. In addition, 15bp are clipped off the 3' end, which takes care of the fill-in bias in RRBS experiments, however only for Read 1. We need to remove 2bp from the 5' end for Read 2 to get rid the fill-in bias at the start of Read2 (marked with B for biased):

Read 1  5' UUUUUUUU CAGTA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF TACTG UUUUUUUU 3'
Read 2  3' UUUUUUUU GTCAT BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF ATGAC UUUUUUUU 5'

Resulting in:

Read 1  5' - FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF - 3'
Read 2  3' - FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF   - 5'
FelixKrueger commented 6 months ago

This has now been addressed here: 9daa764986c646805fa3d0c4aba7c9e053404938