BjornFJohansson / pydna

Clone with Python! Data structures for double stranded DNA & simulation of homologous recombination, Gibson assembly, cut & paste cloning.
Other
160 stars 39 forks source link

Feature error in Dseqrecord::__getitem__ when slicing circular molecules #160

Closed manulera closed 5 months ago

manulera commented 7 months ago

When slicing a circular sequence, sometimes features that should not be retained are retained, example below:


seqRecord = Dseqrecord("aaagGTACCTTTGGATCcggg", circular=True)
f1 = SeqFeature(SimpleLocation(4, 17), type="misc_feature", strand=1)
seqRecord.features.append(f1)
print(len(seqRecord[13:8].features))

The reason why it happens is that when calling __getitem__ in Dseqrecord:

elif self.circular and sl_start > sl_stop:
    answer.features = self.shifted(sl_start).features
    answer.features = [f for f in answer.features if f.location.parts[-1].end <= answer.seq.length]

The shifting turns the feature [4:17](+) into join{[12:21](+), [0:4](+)}, and f.location.parts[-1].end <= answer.seq.length is then smaller than the length, even if the feature spans beyond the end of the slice.

It can be fixed like this.

elif self.circular and sl_start > sl_stop:
    answer.features = self.shifted(sl_start).features
    # origin-spanning features should only be included after shifting
    # in cases where the slice comprises the entire sequence, but then
    # sl_start == sl_stop and the second condition is not met
    answer.features = [f for f in answer.features if (
                       f.location.parts[-1].end <= answer.seq.length and
                       f.location.parts[0].start <= f.location.parts[-1].end)]

I think there are some other edge cases but I will open a separate issue for those.

BjornFJohansson commented 7 months ago

Can we add this already?

manulera commented 7 months ago

This is in this commit: https://github.com/BjornFJohansson/pydna/commit/7ce18818cc2a6547125fbd735cf2c2e634040c51

This is already included in the branch cutsite_pairs, but not sure you will be merging that in the next release. I can try to re-base if needed, but if you are planning to include that branch in the next release it should be fine