Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
213 stars 38 forks source link

Fix AvoidHairpins to search within last window #37

Closed NikashS closed 3 years ago

NikashS commented 4 years ago

I noticed that the AvoidHairpins specification does not search for hairpins within the last window. Intuitively, if I set my window size to be the same as my sequence length, the algorithm should find every hairpin in the sequence. In reality, though, this algorithm will not find any hairpins.

Here is some updated pseudocode that I briefly tested, it seems to be working:

for i in range(len(sequence) - stem_size:
    word = sequence[i : i + stem_size]
    if len(sequence) - i < window:
        window = len(sequence) - i
    if stem_size > window / 2:
        break
    rest = reverse[-(i + window) : -(i + stem_size)]
    if word in rest:
        // follow existing algorithm
Zulko commented 4 years ago

Thanks, if that's true that's a great catch! Do you have minimal code and sequence to reproduce the issue?

NikashS commented 4 years ago

Don't have minimal code but if I run codon optimization with an avoid hairpin constraint on the sequence "attcaatgggggggggggggggggggggggggtagccta" with parameters stem_size=3 and window=8, it will detect the first hairpin pair but not the one near the end of the sequence.

Zulko commented 4 years ago

Thanks, it looks like you were right and you had the right fix (the mistake was to stop at i-hairpin_window instead of i-stem-size). I went ahead and pushed a commit to master which applies the fix (and a test for it) to save you a few clicks @veghp ! @NikashS let us know how it works for you.

veghp commented 4 years ago

Thank you! I tried and it changes the sequence to 'ATTCACTGGGGGGGGGGGGGGGGGGGGGGGGGTAGCCGC' so it works fine now.

Tests pass too.

prusnak commented 3 years ago

I think this issue should be closed (as per comment above).