Currently, for a circular sequence seq, seq[0:0] or seq[1:1] return the linearised version of that sequence. That makes sense, but is problematic for certain cases. Let's imagine we create a function that wants to get the first x nucleotides after a given base, the function would return an unexpected result for x == 0: seq[0:0+0] would give the entire sequence.
Knowing that this is the behaviour, it is then possible to add an exception to the function for x == 0, but it's still not great.
Possible alternative
We could support slicing of circular molecules with indexes bigger than the length of the sequence, for instance, what now is represented as seq[7:2] for a sequence of length 10, it could be represented as seq[7:12]. This is equivalent to the behaviour of a circular string, and potentially would be allowed interesting functionality, such as getting more than a full circle. In the previous example of a sequence of length 10, seq[1:15] could return more than one loop.
The problem
When we discussed the other day, we said that a lot of pydna functions use module operations to not have expressions like seq[7:12], and have seq[7:2] instead, so this change may break some code, even if both syntaxes are still supported. The tests most likely would pick up the errors introduced, but is a breaking change for other users, so it should be postponed until a major release.
A followup to #161
The problem
Currently, for a circular sequence
seq
,seq[0:0]
orseq[1:1]
return the linearised version of that sequence. That makes sense, but is problematic for certain cases. Let's imagine we create a function that wants to get the firstx
nucleotides after a given base, the function would return an unexpected result forx == 0
:seq[0:0+0]
would give the entire sequence.Knowing that this is the behaviour, it is then possible to add an exception to the function for
x == 0
, but it's still not great.Possible alternative
We could support slicing of circular molecules with indexes bigger than the length of the sequence, for instance, what now is represented as
seq[7:2]
for a sequence of length 10, it could be represented asseq[7:12]
. This is equivalent to the behaviour of a circular string, and potentially would be allowed interesting functionality, such as getting more than a full circle. In the previous example of a sequence of length 10,seq[1:15]
could return more than one loop.The problem
When we discussed the other day, we said that a lot of pydna functions use module operations to not have expressions like
seq[7:12]
, and haveseq[7:2]
instead, so this change may break some code, even if both syntaxes are still supported. The tests most likely would pick up the errors introduced, but is a breaking change for other users, so it should be postponed until a major release.