Open manulera opened 3 months ago
This was by design. It is there so that we can make staggered sequences like so:
from pydna.dseq import Dseq
from pydna.utils import rc
seq1 = "ACGGCAGCCCGT"
seq2 = rc(seq1)
seq1_padded = "aaa" + seq1
seq2_padded = "ccc" + seq2
dseq1 = Dseq(seq1_padded, seq2_padded)
print(repr(dseq1))
Dseq(-18)
aaaACGGCAGCCCGT
TGCCGTCGGGCAccc
Does this create problems in other use cases? Maybe a warning would be appropriate.
Hi @BjornFJohansson, in my example the returned sequence has mismatches at both ends, that's the problematic bit.
If you are manually typing both strands, you may make a mistake when typing one of them, and you may want to get an error in that case.
You can create a sequence with mismatches and stagger by passing the overhang, but not sure the auto-find of overhangs should be returning sequences with mismatches. In general, most functions of pydna will give unexpected behavior if there are mistmaches I guess? So I think an error would be good
Hi @BjornFJohansson I was wondering whether we want to support this kind of behaviour for Dseq, or whether it is unintended.
gives a dseq with mismatches
I wonder if we should constrict the representation to have no mismatches (e.g. use terminal_overlap instead of common_substrings)? Or give an error if one like this comes up?