BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
166 stars 45 forks source link

Improve cas9 documentation #259

Open manulera opened 2 months ago

manulera commented 2 months ago

Currently the functions do not have docstrings, and it's hard to understand how to use them.

JeffXiePL commented 1 month ago

Hi Manu, thought I would join in here and say that I'm sure how to use the CRISPR functions either. I have looked at the tests but I couldn't understand why the tests worked but my example here didn't. Could you please take a look if you have time?

manulera commented 1 month ago

Hi @JeffXiePL,

Yes, as mentioned before the crispr module is not very well documented yet.

For documentation purposes (in case someone is interested / documents this in the hackathon), to design a gRNA we have to take into account three things:

  1. The protospacer sequence: the sequence that is homologous to the target sequence, and after which the Cas9 enzyme will cut.
  2. The PAM sequence: a sequence that must be present in the target sequence, after the protospacer sequence, for Cas9 this sequence is "NGG", but it can be different for other enzymes.
  3. The scaffold sequence: the sequence that is needed for the gRNA to bind to the Cas9 enzyme.

I think that's roughly correct, but a great place to learn more about CRISPR is this AddGene post.

As for pydna, the cas9 takes the protospacer sequence as an argument to instantiate the class, so you can either:

  1. Provide the full gRNA sequence and use the protospacer function to get the protospacer sequence.
  2. Provide the protospacer sequence and instantiate the cas9 class directly.

Below is a minimal example for both cases:

from pydna.dseqrecord import Dseqrecord
from pydna.crispr import cas9, protospacer

#         <----protospacer---><-------scaffold----------------->
guide =  "GTTACTTTACCCGACGTCCCgttttagagctagaaatagcaagttaaaataagg"
target = "GTTACTTTACCCGACGTCCCaGG"
#                             <->
#                             PAM

# Create an enzyme object with the protospacer
enzyme = cas9("GTTACTTTACCCGACGTCCC")

target_dseq = Dseqrecord(target)

# Cut using the enzyme
print('cutting with enzyme 1:', target_dseq.cut(enzyme))

# Get the protospacer from the full gRNA sequence
gRNA_protospacers = protospacer(Dseqrecord(guide), cas=cas9)
# Print the protospacer (it's a list because often plasmids contain multiple gRNAs)
print('protospacer:', gRNA_protospacers[0])
gRNA_protospacer = gRNA_protospacers[0]

# Create an enzyme from the protospacer
enzyme2 = cas9(gRNA_protospacer)

# Simulate the cut
print('cutting with enzyme 2:', target_dseq.cut(enzyme2))

# Note that without the PAM, the cut will not be made.

target_noPAM_dseq = Dseqrecord("GTTACTTTACCCGACGTCCCaaa")
print("cutting with no PAM in target:", target_noPAM_dseq.cut(enzyme2))

Hopefully this is useful for the hackathon

JeffXiePL commented 1 month ago

Thank you so much for the detailed explanation! This makes a lot more sense now, and I'll work on the CRIPSR module in the upcoming days and the hackathon.

dgruano commented 1 month ago

I'm also planning on improving the CRISPR module functions and documentation! See #215

manulera commented 1 month ago

@hiyama341

hiyama341 commented 1 month ago

I'm down for working on this too :)