Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
219 stars 40 forks source link

Avoid TF sites example does not work #65

Closed mschmidt75 closed 2 years ago

mschmidt75 commented 2 years ago

I am using Anaconda on a Windows PC. Everything is up to date on my side. I would really love to use this feature but I am not knowledgeable enough to write my own script.

Thanks for helping/fixing!!

I get the following error after executing the example file:


NoSolutionError Traceback (most recent call last) C:\Users\Public\Documents\Wondershare\CreatorTemp/ipykernel_23256/3391708123.py in 24 ) 25 ---> 26 problem.resolve_constraints() 27 problem.max_random_iters = 20000 28 problem.to_record("sequence_without_tf_binding_sites.gb")

~\Anaconda3\lib\site-packages\dnachisel\DnaOptimizationProblem\mixins\ConstraintsSolverMixin.py in resolve_constraints(self, final_check, cst_filter) 358 except NoSolutionError as error: 359 self.logger(constraint__index=len(constraints)) --> 360 raise error 361 if final_check: 362 self.perform_final_constraints_check()

~\Anaconda3\lib\site-packages\dnachisel\DnaOptimizationProblem\mixins\ConstraintsSolverMixin.py in resolve_constraints(self, final_check, cst_filter) 355 ): 356 try: --> 357 self.resolve_constraint(constraint=constraint) 358 except NoSolutionError as error: 359 self.logger(constraint__index=len(constraints))

~\Anaconda3\lib\site-packages\dnachisel\DnaOptimizationProblem\mixins\ConstraintsSolverMixin.py in resolve_constraint(self, constraint) 319 location__message="Cold exit", 320 ) --> 321 raise error 322 else: 323 continue

~\Anaconda3\lib\site-packages\dnachisel\DnaOptimizationProblem\mixins\ConstraintsSolverMixin.py in resolve_constraint(self, constraint) 303 constraint.resolution_heuristic(local_problem) 304 else: --> 305 local_problem.resolve_constraints_locally() 306 self._replace_sequence(local_problem.sequence) 307 break

~\Anaconda3\lib\site-packages\dnachisel\DnaOptimizationProblem\mixins\ConstraintsSolverMixin.py in resolve_constraints_locally(self) 171 """ 172 if self.mutation_space.space_size < self.randomization_threshold: --> 173 self.resolve_constraints_by_exhaustive_search() 174 else: 175 self.resolve_constraints_by_random_mutations()

~\Anaconda3\lib\site-packages\dnachisel\DnaOptimizationProblem\mixins\ConstraintsSolverMixin.py in resolve_constraints_by_exhaustive_search(self) 78 raise NoSolutionError( 79 "Exhaustive search failed to satisfy all constraints.", ---> 80 problem=self, 81 ) 82

NoSolutionError: While solving AvoidPattern0-3711 in 0-6:

Exhaustive search failed to satisfy all constraints.

Zulko commented 2 years ago

Can you share the code you are using?

mschmidt75 commented 2 years ago

It is the exaxct same code as here. I just changed the number of bps so it doesn't take forever.

Best, Matthias

Zulko commented 2 years ago

Oh yeah it's broken, good catch. What happened is that this example reads TF binding sites from a third-party web file, and this remote file probably got updated with some extra TFs since the last time @veghp updated the example. In particular, there are now single-nucleotide binding sites in that set! (not sure what that means). So saying "avoid TFs" now implies "avoid As, and Ts, and Gs, and Cs". This explains both why it took forever and why it failed with a NoSolutionError.

A solution is to filter these weird single-nucleotide TFs away, by adding this line after the tf_binding_sequences definition. It works and it's fast:

tf_binding_sequences = ["".join(ch for ch in tf if not ch.islower()) for tf in tf_list]
# Remove single-nucleotide TFs
tf_binding_sequences = [tf for tf in tf_binding_sequences if len(tf) > 1]

Let us know if that works.

mschmidt75 commented 2 years ago

Awesome! It works perfectly fine after adding that one line of code. Figuring that out would have taken me forever. It is indeed odd that they have single nucleotides in that list.

Thanks for the great and quick response!

veghp commented 2 years ago

Thanks, @Zulko . Using archive.org, I can confirm that the file has changed; for example this line:

ECK120012253    SlyA    ECK120033902    1230371 1230382 reverse ECK120033903    hlyE    +   hlyEp   90.5    gactgaaatcGTTGCAGATAAAacggtagaag    [GEA|W|Gene expression analysis],[BPP|S|Binding of purified proteins]   Strong

has become:

ECK120012253    SlyA    SlyA    ECK120033902    1230377 1230376 reverse ECK125301940    ECK120009794    hlyE    +   hlyEp   90.5    aaatcgttgcAgataaaacgg   18.5