Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
213 stars 38 forks source link

can I generate a sequence with a sequence with 'medium' score? #27

Open Lix1993 opened 4 years ago

Lix1993 commented 4 years ago

By default, dnachisel gives a 'best' sequence with score close to 0.
I can get a 'worst' sequence by new_score < score.

If i want a 'medium' sequence , i can set a target score, and set the if condition to abs(new_score - target) < abs(score - target).

Since scores varies from objectives, I have to get worset score first to determint the target score.

Is there any solution to generate 3 sequences just using optimize() once?

Zulko commented 4 years ago

That's a complex problem, and I don't think it can be done efficiently by changing how optimize works.

A lot of the optimization efficiency comes from the way the specifications work, not the optimize method. The specification tells the solver which regions should be mutated. And you don't mutate the same sequence regions depending on whether you want to optimize or de-optimize the sequence. So the best way to "de-optimize" a sequence is to define a new specification whose evaluate method does the contrary (let's call it the anti-specification), i.e. its score is the "opposite" of the original spec, and its suggested locations to optimize are the complementary of the locations suggested by the original spec. But I agree it is a lot of work.

The other question is how you determine a specification's "worst score". That can be complicated, as the worst score can depend on your constraints. The way I would approach it is by defining an anti-specification and running the optimization once with only the anti-specification. The final score is the worst score, and a medium score is anything in-between.

The last point is how you reach a given "medium" score. Here it is even more complicated, because the regions to mutate depend on whether you are currently above or below the score. The closest I have done to that is EnforceChanges(). Used alone, it maximizes the changes in the affected area. But you can set up amount_percent=40 to

If you don't want to dive into all this yet, there may be an quicker solution (but it is untested). You can try using AvoidChanges(max_edits_percent=20), which will restrain sequence changes less than 20% of the sequence. You can use it as a constraint or as an objective. Intuitively, this should help you find "medium-optimized" sequences.

Lix1993 commented 4 years ago

Thanks

Zulko commented 4 years ago

Let's keep this open as an un-resolved issue at the moment for other people to find.