BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
160 stars 39 forks source link

Support for ligation of partially overlapping overhangs? #158

Open manulera opened 7 months ago

manulera commented 7 months ago

Hi @BjornFJohansson,

Me and @dgruano were wondering what would be the best way to ligate fragments with partially overlapping overhangs.

I guess a possibility would be to write a custom alogrithm function for Assembly is there something already there?

@gruano for context, the class Assembly accepts a function as an argument that returns the regions that will be joined.

Here are the built-in algorithms:

https://github.com/BjornFJohansson/pydna/blob/master/src/pydna/common_sub_strings.py

BjornFJohansson commented 7 months ago

There is a new ligation module coming up! Ill push it and let you know.

BjornFJohansson commented 7 months ago

check out the "ligate" branch. It has a new "ligate" module.

BjornFJohansson commented 7 months ago

Maybe I misunderstood. Is it ligation or homologous recombination? A custom algorithm would certainly be possible, but I am not sure how to model the assembly with imperfect ssDNA domains.

manulera commented 7 months ago

I am using the new implementation of the assembly for this, using these two functions for the algorithm, using a slightly modified version of @dgruano's function that returns the overlap length rather than true/false.

def sum_is_sticky(seq1: Dseq, seq2: Dseq, partial: bool=False) -> int:
    """Return true if the 3' end of seq1 and 5' end of seq2 ends are sticky and compatible for ligation."""
    type_seq1, sticky_seq1 = seq1.three_prime_end()
    type_seq2, sticky_seq2 = seq2.five_prime_end()

    if 'blunt' != type_seq2 and type_seq2 == type_seq1 and str(sticky_seq2) == str(reverse_complement(sticky_seq1)):
        return len(sticky_seq1)

    if not partial:
        return 0

    if type_seq1 != type_seq2 or type_seq2 == "blunt":
        return 0
    elif type_seq2 == "5'":
        sticky_seq1 = str(reverse_complement(sticky_seq1))
    elif type_seq2 == "3'":
        sticky_seq2 = str(reverse_complement(sticky_seq2))

    ovhg_len = min(len(sticky_seq1), len(sticky_seq2))
    for i in range(1, ovhg_len+1):
        if sticky_seq1[-i:] == sticky_seq2[:i]:
            return i
    else:
        return 0

def sticky_end_sub_strings(seqx: _Dseqrecord, seqy: _Dseqrecord, limit=0):
    """For now, if limit 0 / False only full overlaps are considered."""
    overlap = sum_is_sticky(seqx.seq, seqy.seq, limit )
    if overlap:
        return [(len(seqx)-overlap, 0, overlap)]
    return []