a-h-b / dadasnake

Amplicon sequencing workflow heavily using DADA2 and implemented in snakemake
GNU General Public License v3.0
45 stars 17 forks source link

reverse complement code is wrong in `cutadapt.smk` and `cutadapt.single.smk` (solution provided) #40

Open ppreshant opened 2 months ago

ppreshant commented 2 months ago

Hello! I discovered a bug when using primer sequence with small caps which was not reverse complementing properly. This was causing cutadapt to truncate some random 10 base pairs from my reads. I hadn't discovered this bug until I saw the cutadapt log and presumably people would rarely use small caps so they wouldn't encounter this.

The issue is in these lines

FWD_RC=`echo {config[primers][fwd][sequence]} | tr '[ATUGCYRSWKMBDHNatugcyrswkbdhvn]' '[TAACGRYSWMKVHDBNtaacgryswmkvhdbn]' |rev`

You can see that the tr command's regex (1st arg) and substitutions (2nd arg) are not of the same length. This us because of a few missing letters.. ATUGCYRSWKMBDHNatugcyrswkbdhvn TAACGRYSWMKVHDBNtaacgryswmkvhdbn

This needs to be fixed by changing the tr's first argument (regex) from ATUGCYRSWKMBDHNatugcyrswkbdhvn to ATUGCYRSWKMBDHVNatugcyrswkmbdhvn in multiple occurrences of lines FWD_RC= and RVS_RC= in both cutadapt smk files.

In summary, this is the modified tr code tr '[ATUGCYRSWKMBDHVNatugcyrswkmbdhvn]' '[TAACGRYSWMKVHDBNtaacgryswmkvhdbn]' |rev