Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
213 stars 38 forks source link

optimize using usage table cann't get a high score when use alone. #26

Closed Lix1993 closed 4 years ago

Lix1993 commented 4 years ago

here is my code:

from dnachisel import *

mEGFP_dna = 'ATGGTGAGCAAGGGCGAGGAGCTGTTCACCGGGGTGGTGCCCATCCTGGTCGAGCTGGACGGCGACGTAAACGGCCACAAGTTCAGCGTGCGCGGCGAGGGCGAGGGCGATGCCACCAACGGCAAGCTGACCCTGAAGTTCATCTGCACCACCGGCAAGCTGCCCGTGCCCTGGCCCACCCTCGTGACCACCCTGACCTACGGCGTGCAGTGCTTCAGCCGCTACCCCGACCACATGAAGCAGCACGACTTCTTCAAGTCCGCCATGCCCGAAGGCTACGTCCAGGAGCGCACCATCTCCTTCAAGGACGACGGCACCTACAAGACCCGCGCCGAGGTGAAGTTCGAGGGCGACACCCTGGTGAACCGCATCGAGCTGAAGGGCATCGACTTCAAGGAGGACGGCAACATCCTGGGGCACAAGCTGGAGTACAACTTCAACAGCCACAACGTCTATATCACGGCCGACAAGCAGAAGAACGGCATCAAGGCGAACTTCAAGATCCGCCACAACGTCGAGGACGGCAGCGTGCAGCTCGCCGACCACTACCAGCAGAACACCCCCATCGGCGACGGCCCCGTGCTGCTGCCCGACAACCACTACCTGAGCACCCAGTCCAAGCTGAGCAAAGACCCCAACGAGAAGCGCGATCACATGGTCCTGCTGGAGTTCGTGACCGCCGCCGGGATCACTCTCGGCATGGACGAGCTGTACAAGTAA'

problem = DnaOptimizationProblem(
    sequence = mEGFP_dna, 
    constraints= [
        EnforceTranslation(),
#         EnforceGCContent('15-90%/20bp'),
#         EnforceGCContent('15-80%/50bp'),
#         EnforceGCContent('28-76%/100bp'), # GC含量
        AvoidHairpins(stem_size=6, hairpin_window=40,location=(0,40)),
        AvoidPattern(r'(.)\1{8}'),   # v
#         AvoidRareCodons(species='284590',min_frequency = 0.1) # 避免出现usage 小于0.1 的稀有密码子
    ],
    objectives=[
        CodonOptimize(species='284590',method = "match_codon_usage"),
#         EnforceGCContent(target = 0.4005),
#         MinimizeMFE( temperature = 30,location=(0,40)), # RNAfold 计算mfe

    ], 
)
print("BEFORE OPTIMIZATION:\n\n", problem.constraints_text_summary())
print(problem.objectives_text_summary())

problem.resolve_constraints()
problem.optimize()

print("AFTER OPTIMIZATION:\n\n", problem.constraints_text_summary())
print(problem.objectives_text_summary())
Lix1993 commented 4 years ago

its result:


 ===> SUCCESS - all constraints evaluations pass
✔PASS ┍ EnforceTranslation[0-720]
      │ Enforced by nucleotides restrictions
✔PASS ┍ AvoidHairpins[0-40](stem_size:6, hairpin_window:40)
      │ Score:         0. Locations: []
✔PASS ┍ AvoidPattern[0-720](pattern:(.)\1{8})
      │ Passed. Pattern not found !

===> TOTAL OBJECTIVES SCORE:   -307.83
   -307.83 ┍ MatchTargetCodonUsage[0-720](284590) 
           │ Codon opt. on window 0-720 scored -3.08E+02
Lix1993 commented 4 years ago

when i usage another objective,such as GC or mfe calculated from RNAfold,

        CodonOptimize(species='284590',method = "match_codon_usage"),
         EnforceGCContent(target = 0.4005),
#        MinimizeMFE( temperature = 30,location=(0,40)), # RNAfold 计算mfe
    ], 

result:

AFTER OPTIMIZATION:

 ===> SUCCESS - all constraints evaluations pass
✔PASS ┍ EnforceTranslation[0-720]
      │ Enforced by nucleotides restrictions
✔PASS ┍ AvoidHairpins[0-40](stem_size:6, hairpin_window:40)
      │ Score:         0. Locations: []
✔PASS ┍ AvoidPattern[0-720](pattern:(.)\1{8})
      │ Passed. Pattern not found !

===> TOTAL OBJECTIVES SCORE:    -19.95
    -19.95 ┍ MatchTargetCodonUsage[0-720](284590) 
           │ Codon opt. on window 0-720 scored -1.99E+01
     -0.00 ┍ EnforceGCContent[0-720](target:0.40) 
           │ Out of bound on segments 0-720
Lix1993 commented 4 years ago

I think optimize the result sequence more than once may help, or I can increase the iter numbers for objective optimize. But these methods ask me to determint the iter numbers.

Set the iter number to a bigger one and break the loop when score doesn't change for the last 100 mutations will make the optimization more smart, I think.

Zulko commented 4 years ago

I'll have a look at this issue, this looks like a bug (what is strange is that the tests on this objective pass).

Regarding your last point " break the loop when score doesn't change for the last 100 mutations", this is already the case and the default is already 100, you can set it differently with `problem.optimization_stagnation_tolerance = 50.See here.

Lix1993 commented 4 years ago

It seems work well when I use dnachisel installed from pip,

doesn't work when I run on the folder downloaded from github.

Lix1993 commented 4 years ago

version 3.0.1 seems work well.

Zulko commented 4 years ago

Turns out this was a bug introduced with my last commit (and I modified a test to make it pass, not realizing it was a bug) sorry for that. It is now fixed on Github.

Zulko commented 4 years ago

Small remark for the future, it is better to write:

AvoidPattern(SequencePattern('(.)\1{8}', size=8))

This way you are telling DnaChisel that the pattern in the regular expression has an expected size of 8.

Lix1993 commented 4 years ago

Thanks

On Wed, Dec 4, 2019 at 10:18 PM Zulko notifications@github.com wrote:

Small remark for the future, it is better to write:

AvoidPattern(SequencePattern('(.)\1{8}', size=8))

This way you are telling DnaChisel that the pattern in the regular expression has an expected size of 8.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Edinburgh-Genome-Foundry/DnaChisel/issues/26?email_source=notifications&email_token=AC77DX5YVZOYRG2JMAYZXLDQW633JA5CNFSM4JVEXGHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEF5FKGY#issuecomment-561665307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC77DXYHEGHXPWDBTUESMODQW633JANCNFSM4JVEXGHA .