BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
166 stars 45 forks source link

User cloning #314

Closed dgruano closed 1 week ago

dgruano commented 1 month ago

A have a minimal working example of USER cloning, to address #309

As a summary, USER removes the uracil base from a dsDNA leaving an abasic site, that is subsequently cleaved by an AP liase. If the upstream sequence is short enough, it will detach due do instability of the ds bonds, leaving a 3' overhang.

I created a USER class that mimics the reaction of USER + AP liase. Did this following the structure of a restriction enzyme. However, the current implementation does now allow to use the search method by Dseq.cut. This is because:

  1. The overhang varies depending on the number of bases upstream of the dU.
  2. There is no way to check for the presence of a dU in the crick strand by looking on the watson strand. Therefore, instead of the usual method of searching for motifs in watson and rc(watson), the USER.search method only looks for USER sites in one strand.

Finally, the products method implements this search function properly, searching first in the watson and then in the crick to find all possible combinations. This are ordered by descending order of melting temperature of the double-stranded fragment (used here as a stability proxy).

@BjornFJohansson , you had noted down an outline and written some code and tests. While my implementation is different, could you check if this is moving in the right direction?

@hiyama341 you also have implemented USER in teemi, and may have used USER in vitro, any thoughts?

manulera commented 1 month ago

Hello,

I have made a pull request to your pull request (https://github.com/dgruano/pydna/pull/1) with a small refactor that allows to use the USER objects as regular restriction enzymes.

It's a bit hacky, but I don't think it could be a problem. Basically, you make the property .ovhg a method with a @property decorator, and as you find the sites in search, you store the overhangs of the cuts in a separate list (.ovhgs). When calling get_cutsites, you access the positions of the cuts in the same order as you found them, so .ovhgs.pop(0) yields the ovhg in the same order.

I don't see a dangerous scenario for this being used in Dseq.get_cutsites since they are called one after the other:

cuts_watson = [c - 1 for c in e.search(self, linear=(not self.circular))]
out += [((w, e.ovhg), e) for w in cuts_watson]

Regarding the "edge-case" situations (multiple u on the same strand / circular sequences), I think for now raising an error should be enough. I don't see a use-case to support those scenarios.

cccUaaaUaaa
tttGtttGttt

What should this give? I don't think it's worth supporting.

cccU

aaaU

cccUaaaU < This would be there probably as well, since USER probably does not cut if there is no double strand (if it cuts first the second U)

        aaa
tttGtttGttt

Of note, don't use .watson and .crick when calling search, because the cut coordinates are with respect to the "full sequence" (search for "full sequence" in the cutsite_pairs notebook).

dgruano commented 1 month ago

Wow! I just learnt a lot of things! I realized the problem to use the search function was that ovhg had to be "dynamic" but didn't think it can actually be done.

And yeah, my bad! Did not realize that Dseq.reverse_complement() gets the ACTUAL reverse complement of the Dseq object but keeping the watson coordinates. Very happy to get to know pydna better.

Will write some tests, merge your PR, implement the nickase and push changes to this PR. Thanks for looking into this!!!

hiyama341 commented 1 month ago

You guys are so cool. Good work!

dgruano commented 3 weeks ago

USER works like a charm. I will implement the nickase and publish the PR for review.