BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
160 stars 39 forks source link

Next getitem implementation for circular sequences #191

Open manulera opened 5 months ago

manulera commented 5 months ago

A followup to #161

The problem

Currently, for a circular sequence seq, seq[0:0] or seq[1:1] return the linearised version of that sequence. That makes sense, but is problematic for certain cases. Let's imagine we create a function that wants to get the first x nucleotides after a given base, the function would return an unexpected result for x == 0: seq[0:0+0] would give the entire sequence.

Knowing that this is the behaviour, it is then possible to add an exception to the function for x == 0, but it's still not great.

Possible alternative

We could support slicing of circular molecules with indexes bigger than the length of the sequence, for instance, what now is represented as seq[7:2] for a sequence of length 10, it could be represented as seq[7:12]. This is equivalent to the behaviour of a circular string, and potentially would be allowed interesting functionality, such as getting more than a full circle. In the previous example of a sequence of length 10, seq[1:15] could return more than one loop.

The problem

When we discussed the other day, we said that a lot of pydna functions use module operations to not have expressions like seq[7:12], and have seq[7:2] instead, so this change may break some code, even if both syntaxes are still supported. The tests most likely would pick up the errors introduced, but is a breaking change for other users, so it should be postponed until a major release.