Unexpected PAM truncation

RussBainer commented 5 months ago

Hi JP and team, I'm trying to make a new CrisprNuclease object based on an enzyme that has been shown to have a more permissive pam sequence, which I initially tried to encode by specifying more pams and weights. When I did this, I found that the pams appear to be internally capped at 4:

> pams
 [1] "(3/3)ACC" "(3/3)CCC" "(3/3)TCC" "(3/3)GCC" "(3/3)ACA" "(3/3)CCA" "(3/3)TCA" "(3/3)GCA" "(3/3)ACG" "(3/3)CCG" "(3/3)TCG" "(3/3)GCG"
[13] "(3/3)ACT" "(3/3)CCT" "(3/3)TCT" "(3/3)GCT"
> pw
 [1] 0.40 0.40 0.40 0.40 0.43 0.43 0.43 0.43 0.32 0.32 0.32 0.32 0.30 0.30 0.30 0.30
> 
> eNme2c <- CrisprNuclease("eNme2c",
+                          targetType="DNA",
+                          pams=pams,
+                          weights=pw,
+                          metadata=list(description="eNme2c nuclease, Cas9 variant from Neisseria meningitidis"),
+                          pam_side="3prime",
+                          spacer_length=20)
> 
> pams(eNme2c)
DNAStringSet object of length 4:
    width seq                                                                                                            names               
[1]     3 ACA                                                                                                            ACA
[2]     3 CCA                                                                                                            CCA
[3]     3 TCA                                                                                                            TCA
[4]     3 GCA                                                                                                            GCA

This does not happen when I try to make a simple Nuclease object, but is introduced when turn that into a CrisprNuclease:

> flarg <- Nuclease('Flarg', 'DNA', motifs = pams, weights = pw)
> motifs(flarg)
DNAStringSet object of length 16:
     width seq
 [1]     3 ACC
 [2]     3 CCC
 [3]     3 TCC
 [4]     3 GCC
 [5]     3 ACA
 ...   ... ...
[12]     3 GCG
[13]     3 ACT
[14]     3 CCT
[15]     3 TCT
[16]     3 GCT
> flarg.cn <- new("CrisprNuclease", flarg, pam_side="3prime", spacer_length = as.integer(20))
> pams(flarg.cn)
DNAStringSet object of length 4:
    width seq                                                                                                            names               
[1]     3 ACA                                                                                                            ACA
[2]     3 CCA                                                                                                            CCA
[3]     3 TCA                                                                                                            TCA
[4]     3 GCA                                                                                                            GCA

I personally have a workaround for this use case, but I thought I would raise it in case this isn't the functionality you want.

Thanks again for this awesome toolset!

Jfortin1 commented 5 months ago

@RussBainer Try with primary=FALSE

RussBainer commented 5 months ago

@Jfortin1 thanks for the pointer and sorry to be slow responding. After your tip I understand the tooling better and realize that the objects are working as intended. Thanks!

In case others reach this page, the pams() function only returns the most likely pam sequences by default, and secondary sequences are not included in the findSpacers() function call unless the canonical=FALSE flag is set, which one can discover if you carefully RTFM :-). These defaults make sense to me, but led me to confusion when designing sequences for a nuclease with multiple high probability pams.

Thanks as always for the toolset!

crisprVerse / crisprDesign

Unexpected PAM truncation #34