BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
166 stars 45 forks source link

Return error if argument overlap is too high in assembly_fragments #265

Open manulera opened 2 months ago

manulera commented 2 months ago

Right now, no error is returned if a value too high for overlap is passed to assembly_fragments. For instance, you can pass a number longer than the template sequence itself, and it will simply return the homology until the end of the template sequence.

I think this should raise a valueerror, because the argument is being ignored.

This can be worked on during the hackathon.

from pydna.dseqrecord import Dseqrecord
from pydna.design import primer_design, assembly_fragments

templates = [
    Dseqrecord('AAACAGTAATACGTTCCTTTTTTATGATGATGGATGACATTCAAAGCACTGATTCTAT'),
    Dseqrecord('AAGGACAACGTTCCTTTTTTATGATATATATGGCACAGTATGATCAAAAGTTAAGTAC'),
]

homology_length = 2000  # < Huge number
minimal_hybridization_length = 15
target_tm = 55

initial_amplicons = [
    primer_design(template, limit=minimal_hybridization_length, target_tm=target_tm) for template in templates
]

assembly_amplicons = assembly_fragments(initial_amplicons, overlap=homology_length)

for a in assembly_amplicons:
    for p in a.primers():
        print(p.seq)
manulera commented 1 month ago

What do you think of this one @BjornFJohansson ? Do you agree it should raise an error?

BjornFJohansson commented 1 month ago

Wouldn't a warning be enough? The fragments in you example are what you would expect. Which argument is being ignored? If we go for an error, it needs to say what the longest permitted overlap is.

manulera commented 1 month ago

What's being ignored is the parameter homology_length (overlap in primer_design), because the primer overlap will be the maximum possible, but not 2000.

In this case, the value is not realistic, but if you are passing a value of overlap, you would not expect to get back primers that have a lower overlap than that. That's why I think an error would make sense.

BjornFJohansson commented 1 month ago

OK, the error should probably tell the user what the maximum permitted overlap is for the particular assembly. I suggest the shortest fragment of the assembly list that are longer than the maxlink limit.

manulera commented 1 month ago

I tried for a while with this one, but the function is a bit convoluted. I think unless we refactor it, it would be too cumbersome to test all scenarios.

I would say let's just leave it as is, since the error will be caught downstream if you are imposing the same overlap in the Gibson Assembly. Alternatively, we could run the common substrings and check the length of the overlaps.