Open fsmosca opened 3 years ago
Hi @fsmosca , The simplest way to address this is to use the CACHE_FILE parameter. The evaluations will be saved in a file. When NOMAD is restarted, the file is read, and the solving uses the best points found in it. This method will ensure that a point is not evaluated again. However, the MADS algorithm will restart using the original mesh size provided by the parameters, the number of evaluations counted is reset, etc. If you need NOMAD to restart from the exact state where it was when it was interrupted, you might want to look at advanced parameters HOT_RESTART_READ_FILES and HOT_RESTART_WRITE_FILES.
Thanks I will try your suggestions.
All right here is an example interrupt and resume using HOT_RESTART. It is in python. Please check if it is right.
#!/usr/bin/python
"""
interrupt_resume.py
When an optimization is interrupted you can resume it by using the parameters
HOT_RESTART_READ_FILES and HOT_RESTART_WRITE_FILES. Set these parameters
to True. Also define a CACHE_FILE where past optimization data will be saved.
Note:
Nomad will generate hotrestart.txt, be sure to delete this file if your cache file
is not yet created to avoid segmentation fault.
"""
import PyNomad
def objective_f(opt_param):
"""
Booth function:
f = (x + 2*y - 7)**2 + (2*x + y - 5)**2
f is 0 at x=1, y=3
Ref: https://en.wikipedia.org/wiki/Test_functions_for_optimization
opt_param: A list of param to be optimized.
"""
x = opt_param.get_coord(0)
y = opt_param.get_coord(1)
f = (x + 2*y - 7)**2 + (2*x + y - 5)**2
opt_param.setBBO(str(f).encode("UTF-8"))
return 1 # 1: success 0: failed evaluation
if __name__ == "__main__":
# params options
bb_output_type = 'OBJ'
max_bb_eval = 100
bb_input_type = '* R'
max_eval = 5000
cache_fn = 'hot_restart_cache.txt'
restart = True
params = [
f'BB_OUTPUT_TYPE {bb_output_type}',
f'MAX_BB_EVAL {max_bb_eval}',
f'BB_INPUT_TYPE {bb_input_type}',
f'MAX_EVAL {max_eval}',
f'CACHE_FILE {cache_fn}',
f'HOT_RESTART_READ_FILES {restart}',
f'HOT_RESTART_WRITE_FILES {restart}'
]
# Define param init and limits.
init_opt_param = [0., 0.]
lb = [-10., -10.]
ub = [10., 10.]
# Start the optimization.
best_param, best_value, _, num_evals, num_iters, _ = PyNomad.optimize(objective_f, init_opt_param, lb, ub, params)
print()
print(f'best param : {best_param}')
print(f'best value : {best_value}')
print(f'num evals : {num_evals}')
print(f'num_iters : {num_iters}')
python interrupt_resume.py
BBE OBJ
1 74 *
3 74
3 90
5 290
5 2 *
6 650
7 5
8 41
9 5
10 20
12 80
12 180
14 104
14 164
15 0 *
16 18
^C
NOMAD caught User interruption.
Please wait...
A termination criterion is reached: Ctrl-C (Base)
Save information for hot restart.
Write hot restart file.
Best feasible solution: #674 ( 1 3 ) Evaluation OK f = 0 h = 0
Best infeasible solution: Undefined.
Blackbox evaluations: 16
Total model evaluations: 732
Cache hits: 2
Total number of evaluations: 18
best param : [1.0, 3.0]
best value : 0.0
num evals : 16
num_iters : 0
CACHE_HITS 2
BB_OUTPUT_TYPE OBJ
( -3 1 ) EVAL_OK ( 164.0 )
( -2 -2 ) EVAL_OK ( 290.0 )
( -2 2 ) EVAL_OK ( 74.0 )
( -2 6 ) EVAL_OK ( 18.0 )
( 0 0 ) EVAL_OK ( 74.0 )
( 1 1 ) EVAL_OK ( 20.0 )
( 1 2 ) EVAL_OK ( 5.0 )
( 1 3 ) EVAL_OK ( 0.0 )
( 1 7 ) EVAL_OK ( 80.0 )
( 2 -2 ) EVAL_OK ( 90.0 )
( 2 2 ) EVAL_OK ( 2.0 )
( 2 3 ) EVAL_OK ( 5.0 )
( 3 -3 ) EVAL_OK ( 104.0 )
( 3 4 ) EVAL_OK ( 41.0 )
( 7 3 ) EVAL_OK ( 180.0 )
( 8 8 ) EVAL_OK ( 650.0 )
python interrupt_resume.py
Read hot restart file /home/username/mynomad/nomad/interfaces/PyNomad/./hotrestart.txt
BBE OBJ
18 290
18 290
20 50
20 50
22 9
22 9
23 41
25 18
25 18
26 2
27 1.625
28 1.625
29 0.6473
30 0.8573
31 0.2421
32 0.617
33 0.2061
34 0.225
35 0.1025
36 0.0986
37 0.2061
38 0.0305
39 0.0845
40 0.0234
41 0.0317
42 0.0117
43 0.0117
44 0.0045
45 0.009
46 0.0017
47 0.0026
48 0.0005
49 0.0018
50 0.0002
51 0.0009
53 0.4905
53 0.4905
55 4.4105
55 4.4105
57 0.08
BBE OBJ
57 0.08
59 0.464
59 0.464
61 0.02
61 0.02
63 0.116
63 0.116
64 0.000113
65 0.000313
66 0.000056
67 0.000115
68 0.000036
69 0.000038
70 0.000014
71 0.000024
72 0.000007
73 0.000013
74 0.000005
75 0.000005
76 0.000002
77 0.000003
78 0.000001
80 0.025625
79 0.025625
82 0.005625
82 0.005625
83 0.000002
84 0.000001
86 0.00072
86 0.00072
88 0.005776
88 0.005776
89 0.000001
90 0.0
91 0.0
92 0.0
93 0.0
94 0.0
96 0.000446
96 0.000446
BBE OBJ
98 0.000558
98 0.000558
99 0.0
100 0.0
A termination criterion is reached: Max number of blackbox evaluations (Eval Global) No more points to evaluate 100
Save information for hot restart.
Write hot restart file.
Best feasible solution: #7 ( 1 3 ) Evaluation OK f = 0 h = 0
Best infeasible solution: Undefined.
Blackbox evaluations: 100
Total model evaluations: 1892
Cache hits: 22
Total number of evaluations: 122
best param : [1.0, 3.0]
best value : 0.0
num evals : 100
num_iters : 0
CACHE_HITS 22
BB_OUTPUT_TYPE OBJ
( -4 0 ) EVAL_OK ( 290.0 )
( -3 1 ) EVAL_OK ( 164.0 )
( -2 -2 ) EVAL_OK ( 290.0 )
( -2 2 ) EVAL_OK ( 74.0 )
( -2 6 ) EVAL_OK ( 18.0 )
( -2 8 ) EVAL_OK ( 50.0 )
( -1 2 ) EVAL_OK ( 41.0 )
( 0 0 ) EVAL_OK ( 74.0 )
( 0 2 ) EVAL_OK ( 18.0 )
( 0 4 ) EVAL_OK ( 2.0 )
( 0 5 ) EVAL_OK ( 9.0 )
( 0.25 3.25 ) EVAL_OK ( 1.625 )
( 0.43999999999999994671 3.6899999999999995026 ) EVAL_OK ( 0.8572999999999981 )
( 0.5 3.4900000000000002132 ) EVAL_OK ( 0.49050000000000016 )
( 0.51000000000000000888 2.5 ) EVAL_OK ( 4.410500000000001 )
( 0.64000000000000001332 3.329999999999999627 ) EVAL_OK ( 0.24209999999999982 )
( 0.69000000000000005773 2.9399999999999999467 ) EVAL_OK ( 0.6472999999999993 )
( 0.80000000000000004441 3.1200000000000001066 ) EVAL_OK ( 0.07999999999999964 )
( 0.87000000000000010658 3.1199999999999996625 ) EVAL_OK ( 0.031700000000000034 )
( 0.88000000000000000444 2.7999999999999998224 ) EVAL_OK ( 0.46400000000000086 )
( 0.89000000000000012434 3.2000000000000001776 ) EVAL_OK ( 0.08450000000000052 )
( 0.89000000000000001332 3.2800000000000002487 ) EVAL_OK ( 0.2061000000000002 )
( 0.9000000000000000222 3.0600000000000000533 ) EVAL_OK ( 0.019999999999999928 )
( 0.93999999999999994671 2.8999999999999999112 ) EVAL_OK ( 0.11600000000000016 )
( 0.94999999999999995559 3.0249999999999999112 ) EVAL_OK ( 0.005625000000000027 )
( 0.95000000000000006661 3.3900000000000001243 ) EVAL_OK ( 0.6170000000000007 )
( 0.96999999999999997335 3.010000000000000675 ) EVAL_OK ( 0.0025999999999998715 )
( 0.96999999999999997335 3.0299999999999998046 ) EVAL_OK ( 0.0017999999999999765 )
( 0.97000000000000008438 3.0899999999999998579 ) EVAL_OK ( 0.023399999999999855 )
( 0.9749999999999999778 2.9500000000000001776 ) EVAL_OK ( 0.02562499999999993 )
( 0.97999999999999998224 3.0158000000000000362 ) EVAL_OK ( 0.0007202000000000124 )
( 0.98419999999999996376 2.9799999999999999822 ) EVAL_OK ( 0.0057762000000000134 )
( 0.98750000000000004441 3.0075000000000002842 ) EVAL_OK ( 0.00031250000000000445 )
( 0.98999999999999999112 2.999299999999999855 ) EVAL_OK ( 0.0005584499999999901 )
( 0.98999999999999999112 3 ) EVAL_OK ( 0.0004999999999999787 )
( 0.99550000000000005151 3.0030000000000005578 ) EVAL_OK ( 3.8249999999994906e-05 )
( 0.99749999999999994227 2.9975000000000000533 ) EVAL_OK ( 0.00011250000000000852 )
( 0.99770000000000003126 3.0009000000000001229 ) EVAL_OK ( 1.3940000000001637e-05 )
( 0.99780000000000002025 3.0034999999999993925 ) EVAL_OK ( 2.384999999998742e-05 )
( 0.99830000000000007621 3.0020000000000002238 ) EVAL_OK ( 7.250000000002267e-06 )
( 0.99929999999999996607 3.0099999999999997868 ) EVAL_OK ( 0.0004464499999999693 )
( 0.99939999999999962199 3.0002999999999997449 ) EVAL_OK ( 8.100000000026193e-07 )
( 0.99960000000000004405 3.0004999999999997229 ) EVAL_OK ( 4.4999999999950124e-07 )
( 0.99970000000000003304 3.000100000000000211 ) EVAL_OK ( 2.5999999999967627e-07 )
( 0.99970000000000003304 3.000300000000000189 ) EVAL_OK ( 1.799999999996939e-07 )
( 0.99990000000000001101 2.9974000000000002863 ) EVAL_OK ( 3.592999999998973e-05 )
( 0.99990000000000001101 2.999499999999999833 ) EVAL_OK ( 1.700000000000425e-06 )
( 0.99990000000000001101 3.000199999999999978 ) EVAL_OK ( 9.000000000011341e-08 )
( 0.99990000000000001101 3.0011000000000001009 ) EVAL_OK ( 5.2200000000012485e-06 )
( 1 1 ) EVAL_OK ( 20.0 )
( 1 2 ) EVAL_OK ( 5.0 )
( 1 3 ) EVAL_OK ( 0.0 )
( 1 3.000199999999999978 ) EVAL_OK ( 1.999999999997783e-07 )
( 1 7 ) EVAL_OK ( 80.0 )
( 1.0000750000000000473 2.9999999999999995559 ) EVAL_OK ( 2.8124999999735678e-08 )
( 1.000099999999999989 2.9996000000000000441 ) EVAL_OK ( 5.300000000000165e-07 )
( 1.000199999999999978 2.9998999999999993449 ) EVAL_OK ( 8.999999999958049e-08 )
( 1.0005999999999999339 3.001100000000000545 ) EVAL_OK ( 1.3130000000003103e-05 )
( 1.0006999999999999229 2.9900000000000002132 ) EVAL_OK ( 0.0004464499999999693 )
( 1.000700000000000145 2.9993000000000002991 ) EVAL_OK ( 9.799999999991624e-07 )
( 1.0008000000000001339 2.999800000000000022 ) EVAL_OK ( 2.1200000000007763e-06 )
( 1.0008999999999999009 2.9987000000000003652 ) EVAL_OK ( 3.1399999999992195e-06 )
( 1.0015999999999998238 2.9990999999999998771 ) EVAL_OK ( 5.330000000000025e-06 )
( 1.0043999999999999595 2.9944000000000001727 ) EVAL_OK ( 5.6479999999999635e-05 )
( 1.0068999999999999062 2.9969000000000005635 ) EVAL_OK ( 0.00011497999999999999 )
( 1.0099999999999997868 2.9499999999999997335 ) EVAL_OK ( 0.009000000000000202 )
( 1.0100000000000000089 2.9900000000000002132 ) EVAL_OK ( 0.00019999999999999147 )
( 1.0100000000000000089 3.000700000000000145 ) EVAL_OK ( 0.0005584499999999901 )
( 1.0158000000000000362 3.0200000000000000178 ) EVAL_OK ( 0.0057762000000000134 )
( 1.0199999999999997957 2.9900000000000002132 ) EVAL_OK ( 0.0008999999999999616 )
( 1.0200000000000000178 2.9700000000000001954 ) EVAL_OK ( 0.0016999999999999275 )
( 1.0200000000000000178 2.9841999999999999638 ) EVAL_OK ( 0.0007202000000000124 )
( 1.0249999999999999112 3.0499999999999998224 ) EVAL_OK ( 0.02562499999999993 )
( 1.0400000000000000355 3.0100000000000002309 ) EVAL_OK ( 0.011700000000000035 )
( 1.0500000000000000444 2.9599999999999999645 ) EVAL_OK ( 0.004500000000000075 )
( 1.0500000000000000444 2.9750000000000000888 ) EVAL_OK ( 0.005625000000000027 )
( 1.0600000000000000533 3.1000000000000000888 ) EVAL_OK ( 0.11600000000000016 )
( 1.0699999999999998401 2.9199999999999999289 ) EVAL_OK ( 0.01169999999999993 )
( 1.1000000000000000888 2.9399999999999999467 ) EVAL_OK ( 0.020000000000000143 )
( 1.1000000000000000888 2.9700000000000001954 ) EVAL_OK ( 0.030500000000000048 )
( 1.1000000000000000888 3.0500000000000002665 ) EVAL_OK ( 0.10250000000000042 )
( 1.1099999999999998757 2.7200000000000001954 ) EVAL_OK ( 0.2060999999999994 )
( 1.1200000000000001066 3.2000000000000001776 ) EVAL_OK ( 0.46400000000000086 )
( 1.1999999999999999556 2.8799999999999998934 ) EVAL_OK ( 0.07999999999999964 )
( 1.2099999999999999645 2.7700000000000004619 ) EVAL_OK ( 0.0985999999999997 )
( 1.25 2.25 ) EVAL_OK ( 1.625 )
( 1.25 2.9500000000000006217 ) EVAL_OK ( 0.22500000000000134 )
( 1.4899999999999999911 3.5 ) EVAL_OK ( 4.4105000000000025 )
( 1.5 2.5099999999999997868 ) EVAL_OK ( 0.49050000000000016 )
( 2 -2 ) EVAL_OK ( 90.0 )
( 2 1 ) EVAL_OK ( 9.0 )
( 2 2 ) EVAL_OK ( 2.0 )
( 2 3 ) EVAL_OK ( 5.0 )
( 2 4 ) EVAL_OK ( 18.0 )
( 3 -3 ) EVAL_OK ( 104.0 )
( 3 4 ) EVAL_OK ( 41.0 )
( 4 -2 ) EVAL_OK ( 50.0 )
( 6 6 ) EVAL_OK ( 290.0 )
( 7 3 ) EVAL_OK ( 180.0 )
( 8 8 ) EVAL_OK ( 650.0 )
If this is right I would like to contribute this code as an example on interrupt and resume for python interface so other users may learn. Or I would suggest the Nomad team to create such an example in this repository.
best_param, best_value, _, num_evals, num_iters, _ = PyNomad.optimize(objective_f, init_opt_param, lb, ub, params)
Thank you for your suggested example, it seems to work fine! We will discuss if we add it to our examples. Would you provide us with your name, so we can add you to the contributors?
Here are answers to your questions:
My name is Ferdinand Mosca. Thanks for the definitions, I will do some research about it.
BTW I have encountered this doc while searching of how poll work, nice illustrations. If you have a link of something like that, you may also add it on this site.
Hi Ferdinand, Thank you for the suggestion for the documentation. We currently do not have such high level information online. It is in our plans. Viviane
I have tried using nomad as an optimizer of the search parameters of a computer chess engine. Basically there are two engines, base_engine and test_engine. The base engine will take the best parameter values found so far while the test_engine will use the parameter values from nomad. The objective is the result of engine vs engine match at 100 games, Result is minimized and sent to nomad.
param to be optimized: ['LmrFactor', 'FutilityMargin', 'QsearchFutilityMargin']
base engine init param values: [50, 30, 50]
optimizer init param values: [50, 30, 50]
suggested param values: [50, 30, 50]
base engine param: [50, 30, 50]
actual result: 0.51, minimized result: 0.49
match done in 416.7s
BBE BLK_SIZE FRAME_CENTER OBJ
1 1 ( 0 0 0 ) *
suggested param values: [50, 40, 70]
base engine param: [50, 30, 50]
actual result: 0.515, minimized result: 0.485
match done in 415.2s
2 1 ( 50 30 50 ) *
suggested param values: [50, 70, 130]
base engine param: [50, 30, 50]
actual result: 0.5, minimized result: 0.5
match done in 422.6s
3 1 ( 50 40 70 )
suggested param values: [50, 40, 90]
base engine param: [50, 30, 50]
actual result: 0.445, minimized result: 0.5549999999999999
match done in 431.1s
4 1 ( 50 40 70 ) 1
suggested param values: [50, 40, 120]
base engine param: [50, 30, 50]
actual result: 0.47, minimized result: 0.53
match done in 424.4s
suggested param values: [50, 60, 70]
base engine param: [50, 30, 50]
...
I have set the cached file but so far there are no cached files that are saved in the folder. It turns out, the data are only saved in the file if I interrupt the optimization process by control+c for example or when the optimization is done normally. If I am hit by power interruption can nomad save the optimization data in the file? I guess it does not as the computer could no longer process the data. I would like to suggest for nomad to save the optimization data after every eval is completed or after I sent the objective value, this way if I get hit by power interruption the previous data are still there.
Hi Ferdinand,
As you noticed, the cache file is not periodically saved. We could implement that for the next version. We are also currently working on ways that NOMAD may suggest points and then stop, so it would not have to wait for the evaluations. When the evaluations are available, NOMAD could be updated and continue by suggesting next points.
For now, as a workaround, you can use the HISTORY_FILE. It lists the evaluated points with their evaluations, in a format similar to the cache file, but slightly different. If you can convert that history file to an updated cache file, you would lose other algorithmic information (for instance, the mesh size where the algorithm is at), but you would keep information from previously evaluated points. We also are developing a way to use HISTORY_FILE seamlessly, it will be available soon.
I hope this helps,
Viviane
Is there a way to resume the optimization after if is interrupted. Resume means that past trial histories will be considered by the optimizer.
I also thought about saving the parameters and its objective values tried in a file and then load it when resuming the optimization. But does it support pre-loading of parameters and objectives values?