BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
160 stars 39 forks source link

Edge case in Anneal.products #197

Closed manulera closed 5 months ago

manulera commented 5 months ago

Hi @BjornFJohansson this is related to #191. Documenting it here just in case you can think of other places where similar things could happen.

In Anneal.products, this is the line that produces a new sequence based on where the primers anneal

prd = _Dseqrecord(fp) + tpl[fp.position : rp.position] + _Dseqrecord(rp).reverse_complement()

Because of what is described in #161 , previously when fp.position is equal to rp.position, this used to return an empty string for all values except when equal to zero (see below former behaviour of Dseqrecord.getitem).

print(Dseqrecord('AAAA', circular=True)[0:0].seq)
# AAAA

print(Dseqrecord('AAAA', circular=True)[1:1].seq)
# Empty sequence <<<<<<<<<<<<<

Now, in a circular sequence if both indexes are equal, the whole linearised sequence is returned, which is not what you want in this case. This what happens with the following anneal below because of this

cacatacgatttaggtgacactatagaac
CACATCCGAACATAAACAACCCACATACGATTTAGGTGACACTATAGAAC
                             ggttgtttatgttcggatgtg

> What you would want: cacatacgatttaggtgacactatagaaccacatccgaacataaacaacc

> What you get: cacatacgatttaggtgacactatagaacCACATCCGAACATAAACAACCCACATACGATTTAGGTGACACTATAGAACcacatccgaacataaacaacc

To fix this, I have added an extra if statement in this case:

if fp.position == rp.position:
    prd = _Dseqrecord(fp) + _Dseqrecord(rp).reverse_complement()
else:
    # as before