BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
166 stars 45 forks source link

Improve typing of functions in 'crispr' module #215

Open dgruano opened 8 months ago

dgruano commented 8 months ago

I was playing around with the crispr module and came across a weird error where the cut coordinates of a cas9 object were way larger than the target sequence.

from pydna.dseqrecord import Dseqrecord
from pydna.crispr import cas9

guide = Dseqrecord("GTTACTTTACCCGACGTCCC")
target = Dseqrecord("GTTACTTTACCCGACGTCCCaGG")

# Create an enzyme object with the guide RNA
enzyme = cas9(str(guide.seq))

# Search for a cutsite in the target sequence
print(enzyme.search(target))  # prints [148] (should be 18)
print(len(target))  # prints 23

The problem was that I was passing a Dseqrecord object and not a string. I am not very familiar yet with the rest of pydna so do most functions require a string or a Dseq / Dseqrecord object? Should we check the input type within the functions or add type hinting?

Let me know if I can help.

BjornFJohansson commented 8 months ago

Hi and thanks for your interest in pydna. I have been busy with this years round of grant proposals, nomrally I try to respond quicker.

The crispr module right now is a minimally working example. I think the way to go here is to specify something that intuitively describes a linear ssDNA molecule. In pydna, Dseq and Dseqrecords are used for dsDNA. I think better type hinting at the least and perhaps accepting pydna.seqrecord.SeqRecord would make sense?

manulera commented 2 months ago

Hi @dgruano maybe you want to give a go at this one in the Hackathon?

manulera commented 2 months ago

Related to #257

dgruano commented 2 months ago

Yes, I was counting on doing that!

(actually I would swear I had tagged this issue on #257 yesterday...)

manulera commented 2 months ago

A nice followup to this is the documentation: https://github.com/BjornFJohansson/pydna/issues/259

hiyama341 commented 2 months ago

I also have some ideas that would be cool to implement if you wanna team up for the hackathon @dgruano :)

dgruano commented 2 months ago

I'm all ears!

hiyama341 commented 1 month ago

Hi @dgruano, so some of the things I was thinking of incorporating are:

These were just some preliminary thoughts. Looking forward to hearing what you think. :)

dgruano commented 1 month ago
Those are really good suggestions! Maybe we could compile a list of enzymes and methods with appropriate references and then detail the needed steps (e.g. Cas12 is just creating a new enzyme class, but CRISPR-BEST may need new functions). Something like: Feature Type Reference
Cas12 / Cpf1 New enzyme https://www.cell.com/cell/fulltext/S0092-8674(15)01200-3
Alternative Cas9 New enzyme https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4393360
Analyze sequence context New feature https://www.nature.com/articles/nbt.4199
Genome editing New feature https://pubs.acs.org/doi/full/10.1021/acssynbio.3c00188 and here

I am unsure how you would use nearmiss to limit off-targets, can you develop what were you thinking? I will certainly give it a look for my other suggestion in #267 !

dgruano commented 1 month ago

Other possible features:

Near PAM-less / PAM-flexible enzymes

The CRISPR module should also support those Cas enzymes that have more than one PAM. Forr this, we have to:

PAM site search

Taking advantage of Dseq.get_cutsites() we could check all posible PAMs with the currently implemented Cas enzymes (or those enzymes in the collection of the user). We could add a constant crispr.CAS_ENZYMES in the module.

On-target and off-target scores

I'm not very knowledgeable on this respect, but could be a nice addition for the designed guides. Some references are: On-Target

Off-Target

dgruano commented 1 month ago

I totally missed this one:

Support for base editors

This is related to something we want to do in ShareYourCloning. We could achieve this like:

hiyama341 commented 1 month ago

Cool suggestions @dgruano!

For the nearmiss, I think it is a bit of an overkill since the computational load is pretty heavy.