BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
166 stars 45 forks source link

Inconsistent order of fragments after restriction #78

Closed manulera closed 1 year ago

manulera commented 2 years ago

Hello,

I noticed that when calling the function cut(), the order of the returned fragments depends on the order of the enzymes, and the behaviour is different for Dseq and Dseqrecord.

Different behaviour for Dseq and Dseqrecord (bug?)

The difference between Dseq and Dseqrecord when using a RestrictionBatch as input is due to this line: https://github.com/BjornFJohansson/pydna/blob/9b2c2d23088442e20fc328b10eb16bf6a03788df/src/pydna/dseqrecord.py#L955

Which should be

frags = self.seq.cut(*enzymes)

This should be considered a bug, I think, since it is not the expected behaviour.

Order of the output depends on order of the input arguments if using a list of RestrictionTypes

Then, when using either a list of RestrictionTypes, whether the list is unpacked or not, the order of the fragments depends on the order of the objects in the list. Here a minimal example (folded):

Click to expand! ```python from Bio.Restriction.Restriction import CommOnly,RestrictionBatch,RestrictionType from pydna.dseqrecord import Dseqrecord from pydna.dseq import Dseq enzyme_names = ['BamHI', 'EcoRV'] enzymes = [CommOnly.format(e) for e in enzyme_names] for Class in [Dseq,Dseqrecord]: print('with class: ' + str(Class)) object = Class('AAAGGATCCAAAAGATATCAAAAA', linear=False) # With a list of enzymes print(' >','List of enzymes') fragments = object.cut(enzymes) if Class==Dseqrecord: print(' ',fragments[0].seq) else: print(' ',fragments[0]) fragments = object.cut(enzymes[::-1]) if Class==Dseqrecord: print(' ',fragments[0].seq) else: print(' ',fragments[0]) # With a list of enzymes print(' >','Unpacked list of enzymes') fragments = object.cut(*enzymes) if Class==Dseqrecord: print(' ',fragments[0].seq) else: print(' ',fragments[0]) fragments = object.cut(*enzymes[::-1]) if Class==Dseqrecord: print(' ',fragments[0].seq) else: print(' ',fragments[0]) # With a RestrictionBatch print(' >','Restriction batch') fragments = object.cut(RestrictionBatch(first=enzymes)) if Class==Dseqrecord: print(' ',fragments[0].seq) else: print(' ',fragments[0]) fragments = object.cut(RestrictionBatch(first=enzymes[::-1])) if Class==Dseqrecord: print(' ',fragments[0].seq) else: print(' ',fragments[0]) ```

For unpacked lists, this could be fixed by converting the tuple enzymes into a RestrictionBatch adding this:

if len(enzymes) > 1:
            enzymes = (_RestrictionBatch(first=enzymes),)

Just before this line: https://github.com/BjornFJohansson/pydna/blob/9b2c2d23088442e20fc328b10eb16bf6a03788df/src/pydna/dseq.py#L1383

Finally, the function does not seem to be meant to take lists that are not unpacked, but it still works for them, so probably something should be done about that. If so, this else case could be removed, and all outputs from the minimal example I attached should work:

https://github.com/BjornFJohansson/pydna/blob/9b2c2d23088442e20fc328b10eb16bf6a03788df/src/pydna/dseq.py#L1397-L1406

I could implement the proposed changes along with tests, if that's ok.

manulera commented 2 years ago

I made a fork and realised that the second point (order of enzymes in the list) is the intended behaviour, from the tests:

https://github.com/BjornFJohansson/pydna/blob/9b2c2d23088442e20fc328b10eb16bf6a03788df/tests/test_module_dseq.py#L535

https://github.com/BjornFJohansson/pydna/blob/9b2c2d23088442e20fc328b10eb16bf6a03788df/tests/test_module_dseq.py#L505-L511

But then, does this make sense for a situation where the enzymes would cut more than once?

In any case, in the following pull request I propose the fix to the different behaviour for Dseq and Dseqrecord.

BjornFJohansson commented 2 years ago

thanks for your input, ill have time to look into this in about a week or so.

BjornFJohansson commented 2 years ago

Basically, the idea is to differentiate between an argument that is basically a set of enzymes that has no order and an argument that has order such as a list.

In the first case, all enzymes are tested and sorted by the order they cut. Then fragments are produced based on this order. In the second case, the order that the user adds the enzymes to the list is respected.

As you point out, neither of these really reflects reality to 100%. This is why I implemented both options.

BjornFJohansson commented 1 year ago

This is no longer valid as of version 5.2.0. Feeding a list or set of enzymes now produces the same result. Feel free to reopen.