Open ValentinKaisermayer opened 2 years ago
Warning: SubproblemManager::clear() called on non-empty SubproblemManager
Error: Subproblem not found for Algorithm MADS
Call stack:
Main
MADS
MADS Initialization
Call stack:
Main
MADS
terminate called after throwing an instance of 'NOMAD_4_0_0::StepException'
what(): NOMAD::Exception thrown (/workspace/srcdir/nomad/src/Algos/SubproblemManager.cpp, 102) Warning: SubproblemManager could not remove subproblem for Algorithm MADS
signal (22): SIGABRT
@ValentinKaisermayer this is strange. The only way where it could happen is if you solve several subproblems of the main optimization problem, but I do not see how you can have access to this feature via NOMAD.jl
. Could you provide an example which triggers it ?
If you use parallelism, try to avoid it, as mixing Julia
and C++
is not really robust with threads at this time.
Ok, it is the mutli-threading. Would downgrading Julia to the LTS version help?
I am not sure it will change something. Correctly calling C/C++ code in a multithread environment of julia
is a long issue, see for example https://github.com/JuliaLang/julia/issues/17573 or https://github.com/jump-dev/KNITRO.jl/issues/93.
If you have a short example which triggers this issue, I could maybe use it to investigate further. Otherwise, you can directly use the Nomad
solver in commandline and call your parallelized julia
blackbox (infortunately, you will get a penalty time due to the julia
program starting time and the precompilation). In this case, it should work.
I was using multi-threading to solve multiple optimization problems in parallel.
I'm not able to share my original code but maybe I can come up with a MWE.
import GalacticOptim
import NOMAD
rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
p = [1.0, 100.0]
sols = Dict()
Threads.@threads for i in 1:10
x0 = -1 .+ 2 .* rand(2)
optprob = GalacticOptim.OptimizationFunction(rosenbrock)
prob = GalacticOptim.OptimizationProblem(optprob, x0, p, lb=[-1.0, -1.0], ub=[1.0, 1.0])
sol = GalacticOptim.solve(prob, GalacticOptim.NOMADOpt())
sols[i] = sol
end
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa513a7192 -- local_Rb_tree_rotate_left at /workspace/srcdir/gcc-11.1.0/libstdc++-v3/src/c++98\tree.cc:138 [inlined]
_Rb_tree_insert_and_rebalance at /workspace/srcdir/gcc-11.1.0/libstdc++-v3/src/c++98\tree.cc:278
in expression starting at test_NOMAD.jl:8
local_Rb_tree_rotate_left at /workspace/srcdir/gcc-11.1.0/libstdc++-v3/src/c++98\tree.cc:138 [inlined]
_Rb_tree_insert_and_rebalance at /workspace/srcdir/gcc-11.1.0/libstdc++-v3/src/c++98\tree.cc:278
_ZNSt8_Rb_treeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt4pairIKS5_S5_ESt10_Select1stIS8_ESt4lessIS5_ESaIS8_EE16_M_insert_uniqueIRS6_IS5_S5_EEES6_ISt17_Rb_tree_iteratorIS8_EbEOT_.constprop.1025 at .julia\artifacts\afc55cad05cdd4def2b270e9e0e8261c242e5645\bin\libnomadUtils.dll (unknown line)
_ZN11NOMAD_4_0_010Parameters17registerAttributeIbJRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_S9_EEEvS7_T_bbbDpOT0_ at .julia\artifacts\afc55cad05cdd4def2b270e9e0e8261c242e5645\bin\libnomadUtils.dll (unknown line)
_ZN11NOMAD_4_0_010Parameters18registerAttributesERKSt6vectorINS_19AttributeDefinitionESaIS2_EE at .julia\artifacts\afc55cad05cdd4def2b270e9e0e8261c242e5645\bin\libnomadUtils.dll (unknown line)
_ZN11NOMAD_4_0_013RunParameters4initEv at .julia\artifacts\afc55cad05cdd4def2b270e9e0e8261c242e5645\bin\libnomadUtils.dll (unknown line)
createNomadProblem at .julia\artifacts\afc55cad05cdd4def2b270e9e0e8261c242e5645\bin\libnomadCInterface.dll (unknown line)
create_c_nomad_problem at .julia\packages\NOMAD\sZkMd\src\c_wrappers.jl:65
unknown function (ip: 000000003f72e96f)
solve at .julia\packages\NOMAD\sZkMd\src\core.jl:608
#__solve#146 at .julia\packages\GalacticOptim\diXWZ\src\solve\nomad.jl:80
__solve at .julia\packages\GalacticOptim\diXWZ\src\solve\nomad.jl:44 [inlined]
#solve#480 at .julia\packages\SciMLBase\jj8Ix\src\solve.jl:3 [inlined]
solve at .julia\packages\SciMLBase\jj8Ix\src\solve.jl:3
unknown function (ip: 000000003f72d370)
macro expansion at test_NOMAD.jl:12 [inlined]
#221#threadsfor_fun at .\threadingconstructs.jl:85
#221#threadsfor_fun at .\threadingconstructs.jl:52
unknown function (ip: 000000003f6f1da3)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
start_task at /cygdrive/c/buildbot/worker/package_win64/build/src\task.c:877
Allocations: 42927396 (Pool: 42910858; Big: 16538); GC: 40
julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_PKG_PRECOMPILE_AUTO = 0
[a75be94c] GalacticOptim v2.3.1
[02130f1c] NOMAD v2.2.0
With an Optim.jl
solver it works fine.
sols = Dict()
Threads.@threads for i in 1:10
x0 = -1 .+ 2 .* rand(2)
optprob = GalacticOptim.OptimizationFunction(rosenbrock, GalacticOptim.AutoForwardDiff())
prob = GalacticOptim.OptimizationProblem(optprob, x0, p, lb=[-1.0, -1.0], ub=[1.0, 1.0])
sol = GalacticOptim.solve(prob, Optim.Fminbox(Optim.NelderMead()))
sols[i] = sol.u
end
julia> sols
Dict{Any, Any} with 10 entries:
5 => [1.0, 1.0]
4 => [1.0, 1.0]
6 => [-0.628642, 1.49224]
7 => [-1.11821, -0.90395]
2 => [-0.14978, -1.2824]
10 => [1.0, 1.0]
9 => [1.0, 1.0]
8 => [1.0, 1.0]
3 => [1.0, 1.0]
1 => [1.0, 1.0]
Ok I confirm I can reproduce the bug on my machine (MacOS). By using a Distributed
strategy, it works on my environment. The bug must come from the C++
part (maybe due to the use of a static cache
).
I will keep this example and investigate it, but there is an important risk the modifications (if they are done) are considerable. Hence it will take a lot of time.
I will add a message in the documentation noting that one should not use thread parallelism with this package. With Distributed
, it should work.
Thanks once again for the example and your feedback.
Can you share the @distributed
version that works?
import GalacticOptim
import NOMAD
import Distributed
import SharedArrays
Distributed.@everywhere begin
rosenbrock(x, p) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
p = [1.0, 100.0]
end
sols = SharedArrays.SharedArray{Float64}(2,10)
Distributed.@sync Distributed.@distributed for i in 1:10
x0 = -1 .+ 2 .* rand(2)
optprob = GalacticOptim.OptimizationFunction(rosenbrock)
prob = GalacticOptim.OptimizationProblem(optprob, x0, p, lb=[-1.0, -1.0], ub=[1.0, 1.0])
sol = GalacticOptim.solve(prob, GalacticOptim.NOMADOpt())
sols[:,i] = sol.u
end
println(sols)
Machine version
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.5.0)
CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Running this script with four processors:
julia -p 4 script.jl
From worker 4: BBE OBJ
From worker 4: 1 33.73713
From worker 4: 2 22.393944
From worker 4: 4 6.374351
From worker 4: 6 2.731737
From worker 4: 9 0.124707
From worker 3: BBE OBJ
From worker 3: 1 20.556451
From worker 3: 2 7.68111
From worker 2: BBE OBJ
From worker 2: 1 68.819916
From worker 2: 2 7.223481
From worker 2: 4 0.995488
From worker 3: 6 3.952837
From worker 4: 10 0.005097
From worker 3: 9 3.338515
From worker 3: 15 3.334383
From worker 4: 19 0.003075
From worker 4: 21 0.001244
From worker 2: 20 0.851151
From worker 5: BBE OBJ
From worker 5: 1 24.497929
From worker 5: 2 4.371252
From worker 5: 6 0.476184
From worker 3: 29 2.257916
From worker 4: 28 0.000877
From worker 2: 25 0.697234
From worker 2: 26 0.675916
From worker 2: 34 0.670079
From worker 2: 36 0.657172
From worker 2: 37 0.641443
From worker 2: 38 0.626147
From worker 2: 39 0.595923
From worker 2: 40 0.588393
From worker 2: 42 0.536443
From worker 2: 43 0.48392
From worker 2: 46 0.412762
From worker 2: 47 0.359362
From worker 2: 50 0.305265
From worker 2: 51 0.246524
From worker 2: 57 0.214063
From worker 2: 58 0.208873
From worker 2: 59 0.205061
From worker 2: 61 0.172935
From worker 2: 64 0.146752
From worker 2: 68 0.130228
From worker 2: 69 0.125943
From worker 2: 70 0.104491
From worker 2: 73 0.07683
From worker 5: 21 0.466619
From worker 3: 31 1.184677
From worker 4: 38 0
From worker 2: 85 0.064659
From worker 2: 86 0.059078
From worker 2: 87 0.000252
From worker 3: 50 0.989552
From worker 3: 51 0.884018
From worker 3: 52 0.598236
From worker 3: 53 0.377092
From worker 3: 55 0.308768
From worker 3: 57 0.269274
From worker 3: 59 0.218733
From worker 3: 64 0.157411
From worker 3: 65 0.111961
From worker 3: 69 0.088518
From worker 3: 70 0.067514
From worker 3: 72 0.046627
From worker 3: 77 0.026996
From worker 3: 80 0.020108
From worker 5: 55 0.342873
From worker 3: 86 0.015191
From worker 3: 87 0.009445
From worker 5: 58 0.306074
From worker 5: 59 0.268518
From worker 5: 67 0.260619
From worker 5: 68 0.243413
From worker 5: 71 0.216656
From worker 5: 72 0.204304
From worker 5: 74 0.196287
From worker 5: 75 0.168647
From worker 5: 76 0.147212
From worker 5: 77 0.137025
From worker 5: 78 0.113907
From worker 5: 79 0.086064
From worker 5: 80 0.054341
From worker 5: 86 0.041428
From worker 5: 89 0.031191
From worker 3: 96 0.00081
From worker 2: 106 0
From worker 5: 103 0.028363
From worker 5: 104 0.002146
From worker 3: 113 0.000003
From worker 5: 120 0.000002
From worker 5: 121 0
From worker 4: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 4:
From worker 4: Best feasible solution: #2736 ( 0.999938 0.999927 ) Evaluation OK f = 0 h = 0
From worker 4:
From worker 4: Best infeasible solution: Undefined.
From worker 4:
From worker 4: Blackbox evaluations: 125
From worker 4: Total sgte evaluations: 5301
From worker 4: Cache hits: 79
From worker 4: Total number of evaluations: 204
From worker 3: 127 0.000002
From worker 3: 129 0
From worker 2: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 2:
From worker 2: Best feasible solution: #3320 ( 0.999969 0.999952 ) Evaluation OK f = 0 h = 0
From worker 2:
From worker 2: Best infeasible solution: Undefined.
From worker 2:
From worker 2: Blackbox evaluations: 185
From worker 2: Total sgte evaluations: 5177
From worker 2: Cache hits: 67
From worker 2: Total number of evaluations: 252
From worker 4: 1 19.168576
From worker 4: 2 9.283188
From worker 4: 5 3.979058
From worker 4: 6 0.99815
From worker 4: 20 0.644646
From worker 4: 23 0.485317
From worker 4: 26 0.37442
From worker 4: 31 0.364017
From worker 4: 33 0.28892
From worker 4: 35 0.286807
From worker 4: 37 0.278017
From worker 4: 39 0.266063
From worker 4: 42 0.256852
From worker 4: 43 0.246863
From worker 4: 44 0.240982
From worker 4: 45 0.224452
From worker 4: 46 0.196784
From worker 4: 47 0.16513
From worker 4: 49 0.134775
From worker 4: 50 0.126065
From worker 4: 51 0.111366
From worker 4: 54 0.091919
From worker 4: 56 0.077657
From worker 4: 60 0.059894
From worker 4: 61 0.055747
From worker 4: 62 0.000024
From worker 5: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 5:
From worker 5: Best feasible solution: #3604 ( 0.999909 0.999748 ) Evaluation OK f = 0 h = 0
From worker 5:
From worker 5: Best infeasible solution: Undefined.
From worker 5:
From worker 5: Blackbox evaluations: 208
From worker 5: Total sgte evaluations: 5510
From worker 5: Cache hits: 57
From worker 5: Total number of evaluations: 265
From worker 3: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 3:
From worker 3: Best feasible solution: #5412 ( 0.99991 0.999782 ) Evaluation OK f = 0 h = 0
From worker 3:
From worker 3: Best infeasible solution: Undefined.
From worker 3:
From worker 3: Blackbox evaluations: 204
From worker 3: Total sgte evaluations: 6743
From worker 3: Cache hits: 86
From worker 3: Total number of evaluations: 290
From worker 2: 1 50.495849
From worker 4: 84 0
From worker 2: 2 20.810501
From worker 2: 3 11.26114
From worker 2: 6 0.390202
From worker 2: 14 0.182232
From worker 5: 1 21.44384
From worker 5: 3 0.716439
From worker 5: 6 0.20573
From worker 3: 1 11.467042
From worker 3: 2 11.425159
From worker 3: 6 2.999912
From worker 3: 8 2.570502
From worker 3: 9 0.983477
From worker 3: 11 0.391366
From worker 3: 13 0.269394
From worker 5: 9 0.055966
From worker 2: 28 0.063012
From worker 5: 18 0.048939
From worker 5: 22 0.00205
From worker 2: 37 0.040386
From worker 2: 45 0.033811
From worker 2: 47 0.032364
From worker 2:
From worker 2: BBE OBJ
From worker 2: 49 0.015625
From worker 2: 50 0.014279
From worker 2: 52 0.001785
From worker 3: 17 0.268737
From worker 2: 68 0
From worker 3: 33 0.259737
From worker 3: 34 0.239235
From worker 3: 35 0.221269
From worker 3: 39 0.199595
From worker 3: 40 0.173725
From worker 3:
From worker 3: BBE OBJ
From worker 3: 42 0.151642
From worker 3: 44 0.135033
From worker 3: 45 0.120088
From worker 3: 47 0.078947
From worker 3: 48 0.057955
From worker 3: 56 0.035913
From worker 3: 62 0.028929
From worker 3: 63 0.01818
From worker 3: 65 0.012553
From worker 3: 73 0.007808
From worker 5: 40 0
From worker 3: 78 0.000358
From worker 4: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 4:
From worker 4: Best feasible solution: #2393 ( 0.999981 0.999909 ) Evaluation OK f = 0 h = 0
From worker 4:
From worker 4: Best infeasible solution: Undefined.
From worker 4:
From worker 4: Blackbox evaluations: 174
From worker 4: Total sgte evaluations: 4749
From worker 4: Cache hits: 68
From worker 4: Total number of evaluations: 242
From worker 3: 92 0.000106
From worker 3: 98 0.000002
From worker 2: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 2:
From worker 2: Best feasible solution: #1818 ( 0.999988 0.999939 ) Evaluation OK f = 0 h = 0
From worker 2:
From worker 2: Best infeasible solution: Undefined.
From worker 2:
From worker 2: Blackbox evaluations: 149
From worker 2: Total sgte evaluations: 4292
From worker 2: Cache hits: 79
From worker 2: Total number of evaluations: 228
From worker 2: 1 223.603799
From worker 2: 2 2.13726
From worker 5: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 5:
From worker 5: Best feasible solution: #2269 ( 0.999974 0.999971 ) Evaluation OK f = 0 h = 0
From worker 5:
From worker 5: Best infeasible solution: Undefined.
From worker 5:
From worker 5: Blackbox evaluations: 127
From worker 5: Total sgte evaluations: 4087
From worker 5: Cache hits: 60
From worker 5: Total number of evaluations: 187
From worker 2: 20 2.113913
From worker 2: 22 2.093671
From worker 2: 24 1.968391
From worker 2: 27 1.866073
From worker 2: 29 1.694038
From worker 2: 35 1.649773
From worker 2: 36 1.63325
From worker 2: 39 1.612402
From worker 2: 40 1.581272
From worker 2: 41 1.53912
From worker 2: 42 1.531969
From worker 2: 43 1.418954
From worker 2: 44 1.359027
From worker 2: 48 1.230305
From worker 2: 49 1.181162
From worker 3: 114 0.000001
From worker 2: 52 1.082456
From worker 2: 53 0.979387
From worker 2: 55 0.970117
From worker 2: 57 0.865671
From worker 2: 60 0.775835
From worker 2: 63 0.718757
From worker 2: 64 0.631321
From worker 2: 68 0.605121
From worker 2: 69 0.566626
From worker 2: 70 0.533486
From worker 2: 72 0.43379
From worker 2: 73 0.364997
From worker 2: 77 0.266785
From worker 2: 81 0.237296
From worker 2: 82 0.184999
From worker 2: 87 0.150489
From worker 2: 91 0.137709
From worker 2: 92 0.123053
From worker 2: 93 0.099156
From worker 2:
From worker 2: BBE OBJ
From worker 2: 94 0.072941
From worker 2: 98 0.054943
From worker 2: 99 0.027963
From worker 2: 107 0.005423
From worker 2: 117 0.002977
From worker 2: 119 0.001142
From worker 3: 122 0
From worker 2: 128 0.000764
From worker 2: 134 0.000686
From worker 2: 136 0.000392
From worker 2: 139 0.000235
From worker 2: 140 0.000049
From worker 2: 152 0
From worker 3: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 3:
From worker 3: Best feasible solution: #5251 ( 0.999911 0.999848 ) Evaluation OK f = 0 h = 0
From worker 3:
From worker 3: Best infeasible solution: Undefined.
From worker 3:
From worker 3: Blackbox evaluations: 202
From worker 3: Total sgte evaluations: 6460
From worker 3: Cache hits: 74
From worker 3: Total number of evaluations: 276
From worker 3: 1 252.812005
From worker 3: 2 1.513057
From worker 3: 21 1.317592
From worker 3: 24 1.298625
From worker 3: 25 1.233473
From worker 3: 29 1.174417
From worker 3: 32 1.100755
From worker 3: 36 1.083512
From worker 3: 38 1.035323
From worker 3: 39 1.012917
From worker 3: 40 0.960168
From worker 3: 41 0.903475
From worker 3: 43 0.75423
From worker 3: 44 0.655924
From worker 3: 46 0.508425
From worker 3: 53 0.414748
From worker 3: 54 0.385368
From worker 3: 56 0.344196
From worker 3: 59 0.330914
From worker 3: 60 0.257732
From worker 3: 61 0.227407
From worker 3: 63 0.175866
From worker 3: 64 0.152041
From worker 3: 67 0.133797
From worker 3: 68 0.106231
From worker 3:
From worker 3: BBE OBJ
From worker 3: 72 0.048585
From worker 3: 73 0.025429
From worker 3: 79 0.00631
From worker 3: 80 0.003542
From worker 3: 87 0.003261
From worker 3: 94 0.001787
From worker 3: 101 0
From worker 2: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 2:
From worker 2: Best feasible solution: #3020 ( 0.99997 0.999921 ) Evaluation OK f = 0 h = 0
From worker 2:
From worker 2: Best infeasible solution: Undefined.
From worker 2:
From worker 2: Blackbox evaluations: 234
From worker 2: Total sgte evaluations: 5606
From worker 2: Cache hits: 69
From worker 2: Total number of evaluations: 303
From worker 3: A termination criterion is reached: No termination (all). Min mesh size reached (Algo)
From worker 3:
From worker 3: Best feasible solution: #1576 ( 0.999937 0.999939 ) Evaluation OK f = 0 h = 0
From worker 3:
From worker 3: Best infeasible solution: Undefined.
From worker 3:
From worker 3: Blackbox evaluations: 186
From worker 3: Total sgte evaluations: 3642
From worker 3: Cache hits: 62
From worker 3: Total number of evaluations: 248
[0.9997881664469066 0.9999382713454753 0.9998703958987838 0.9997789748396068 0.9998136026540877 0.9997451999415156 0.9997719949345826 0.999896376830331 0.9996399679942679 0.9999491129009155; 0.9995763703850106 0.9999140130561478 0.9997574494720326 0.9995823580700176 0.9996475566889773 0.9994386759300575 0.9995439268190351 0.9997978074811389 0.9992484632915588 0.99992145627526]
It crashed Julia. What does this mean?
Source in C++ code: https://github.com/bbopt/nomad/blob/cb3bb3543b14d9eb8aee270e9ef3b80a67f7708c/src/Algos/SubproblemManager.cpp#L117