Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
213 stars 38 forks source link

EnforcePatternOccurence raises IndexError (in some specific parameters) #52

Closed ghost closed 3 years ago

ghost commented 3 years ago

related to #51

from dnachisel import *
from Bio.SeqFeature import FeatureLocation

seq="ATGCAGAGCAAGGTGCTGCTGGCCGTCGCCCTGTGGCTCTGCGTGGAGACCCGGGCCGCC"

problem = DnaOptimizationProblem(
    sequence=seq,
    constraints=[
        EnforcePatternOccurence( pattern='CTG',occurences=2,strand=-1,location=FeatureLocation(start=0,end=50))
    ],
)

print(problem.constraints_text_summary())
===> FAILURE: 1 constraints evaluations failed
 FAIL ┍ EnforcePatternOccurence[0-50(-)](CTG, occurence, 2)
      │ Failed. Pattern found 1 times instead of 2 wanted, at locations [3-6(-)]

problem.resolve_constraints()
/usr/local/lib/python3.8/site-packages/dnachisel/DnaOptimizationProblem/mixins/ConstraintsSolverMixin.py in resolve_constraints(self, final_check, cst_filter)
    355         ):
    356             try:
--> 357                 self.resolve_constraint(constraint=constraint)
    358             except NoSolutionError as error:
    359                 self.logger(constraint__index=len(constraints))

/usr/local/lib/python3.8/site-packages/dnachisel/DnaOptimizationProblem/mixins/ConstraintsSolverMixin.py in resolve_constraint(self, constraint)
    301                 try:
    302                     if hasattr(constraint, "resolution_heuristic"):
--> 303                         constraint.resolution_heuristic(local_problem)
    304                     else:
    305                         local_problem.resolve_constraints_locally()

/usr/local/lib/python3.8/site-packages/dnachisel/builtin_specifications/EnforcePatternOccurence.py in resolution_heuristic(self, problem)
    183                         mutation_space=problem.mutation_space,
    184                     )
--> 185                     new_occurence_cst.insert_pattern_in_problem(new_problem)
    186                 problem.sequence = new_problem.sequence
    187                 return
constraint:   0%|                                                                 | 0/1 [00:00<?, ?it/s, now=EnforcePatternOccurence[0...]
/usr/local/lib/python3.8/site-packages/dnachisel/builtin_specifications/EnforcePatternOccurence.py in insert_pattern_in_problem(self, problem, reverse)
    132                 sequence=sequence_to_insert, location=new_location
    133             )
--> 134             new_space = MutationSpace.from_optimization_problem(
    135                 problem, new_constraints=[new_constraint]
    136             )

/usr/local/lib/python3.8/site-packages/dnachisel/MutationSpace/MutationSpace.py in from_optimization_problem(problem, new_constraints)
    184             constraints = new_constraints
    185         mutation_choices = sorted(
--> 186             [
    187                 MutationChoice(segment=choice[0], variants=set(choice[1]))
    188                 for cst in constraints

/usr/local/lib/python3.8/site-packages/dnachisel/MutationSpace/MutationSpace.py in <listcomp>(.0)
    187                 MutationChoice(segment=choice[0], variants=set(choice[1]))
    188                 for cst in constraints
--> 189                 for choice in cst.restrict_nucleotides(sequence)
    190             ],
    191             key=lambda choice: (choice.end - choice.start, choice.start),

/usr/local/lib/python3.8/site-packages/dnachisel/builtin_specifications/EnforceSequence.py in restrict_nucleotides(self, sequence, location)
    107         if self.location.strand == -1:
    108             lend = self.location.end
--> 109             return [(i, set(reverse_complement(n) for n in
    110                             IUPAC_NOTATION[self.sequence[lend - i]]))
    111                     for i in range(start, end)]

/usr/local/lib/python3.8/site-packages/dnachisel/builtin_specifications/EnforceSequence.py in <listcomp>(.0)
    108             lend = self.location.end
    109             return [(i, set(reverse_complement(n) for n in
--> 110                             IUPAC_NOTATION[self.sequence[lend - i]]))
    111                     for i in range(start, end)]
    112         else:

IndexError: string index out of range
Zulko commented 3 years ago

Haven't tried it but that's possibly the same issue as the previous one: you are using biopython's FeatureLocation class instead of DnaChisel's Location.

ghost commented 3 years ago

Haven't tried it but that's possibly the same issue as the previous one: you are using biopython's FeatureLocation class instead of DnaChisel's Location.

the related issue was solved by using DnaChisel's Location. However, using it here (for EnforcePatternOccurence) doesn't work!

Zulko commented 3 years ago

Sorry for this @FadiBakoura , this spec uses a special algorithm. It works for strand=1, but you must be the first in a long time to try it for strand=-1 and it doesn't have a test for that value :disappointed:.

@veghp I can't test it right now but my best guess is that the faulty line should read lend - i - 1? Really hope that one won't be a headache :crossed_fingers: :smile:

veghp commented 3 years ago

Thanks! That solves the problem apparently, but I will have to review the logic flow to make sure everything is okay. This is a separate issue but for example if I specify no location, then there is error because in https://github.com/Edinburgh-Genome-Foundry/DnaChisel/blob/9c72428ae822c1e5afba480232158e604c355699/dnachisel/builtin_specifications/EnforcePatternOccurence.py#L67 from_data() returns a None as location, and a value is assigned to its attribute in: https://github.com/Edinburgh-Genome-Foundry/DnaChisel/blob/9c72428ae822c1e5afba480232158e604c355699/dnachisel/builtin_specifications/EnforcePatternOccurence.py#L73

veghp commented 3 years ago

Solution now committed to the dev branch and will be in the next release. If urgent, install with pip install --upgrade git+https://github.com/Edinburgh-Genome-Foundry/DnaChisel.git@dev.