Closed simonp0420 closed 1 year ago
Hi @simonp0420 !
Do you use lock
inside your objective function ? (https://docs.julialang.org/en/v1/manual/multi-threading/)
It seems that you encountered a data-race condition.
I am careful to not attempt to modify the same value from distinct threads in my code. I do use a Threads.Spinlock()
where necessary. Is the multithreading used in NOMAD compatible with Julia's Threads
?
We compiled a sequential version of NOMAD with Yggdrasil (https://github.com/JuliaPackaging/Yggdrasil/blob/master/N/NOMAD/build_tarballs.jl) to avoid issues with Julia's Threads
.
Do you have a MWE such that I can try to find where the problem comes from ?
Hi @simonp0420. If you are a bit on a hurry, you can use the executable Nomad
directly, with a configuration file, where your blackbox is the julia -t n yourblackbox.jl
. You can take a look at https://nomad-4-user-guide.readthedocs.io/en/latest/HowToUseNomad.html for more information. The only drawback is that you will pay a small penalty time (around 1s on my machine) with the startup of julia
.
When I take a look at your logs, it is strange it does not find the libc.so
library or the libNomadAlgos.so
. Did you compile yourself julia, or did you install it from the website ?
I am sorry about the long delay in responding to your answers. For some reason I haven't received notifications of these and I just happened to glance back here. Anyway...
@amontoison I can supply the code I tried to use but it requires installing my PSSFSS package which is not a small, minimal example. Let me know if you would like me to do this or if I should try to come up with a different, much smaller example.
@salomoni, I installed Julia by downloading it (actually using Jill.py on Linux and Chocolately on Windows). I'm not in a hurry, as there are plenty of other optimizers available for my problem. I would like to see if NOMAD is more efficient or arrives at a better solution than, say CMAEvolutionStrategy.jl.
@simonp0420 I have access to a Linux machine (Fedora). If you want, I can take a look at your example. Otherwise, if you have a smaller example, you are welcome.
I tried with a silly @thread loop blackbox function, but it does not fail.
@salomi, thanks for your offer to look at my example. I haven't been able to generate a MWE that also exhibits the seg fault, so I'm guessing that I'm doing something wrong with threading. I hope this isn't a waste of your time, but here is my failing example:
using PSSFSS, NOMAD
using Dates: now
let bestf = typemax(Float64)
global bb
"""
(success, counteval, [objective, c1, c2, c3, c4, c5, c6]) = bb(x)
x = [period, wo, ho, wi, hi, wc, hc, t1, t2]
ao = bo = ac = bc = period; ai = √2*period, bi = period/√2
constraints to be held ≤ 0:
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
"""
function bb(x)
period, wo, ho, wi, hi, wc, hc, t1, t2 = x
ao = bo = ai = bi = ac = bc = period
ai *= √2
bi /= √2
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
returnval = [5000.0,c1,c2,c3,c4,c5,c6]
any(returnval[2:end] .> 0) && (return (false, false, returnval))
outer(rot) = meander(a=ao, b=bo, w1=wo, w2=wo, h=ho, units=mm, ntri=400, rot=rot)
inner(rot) = meander(a=ai, b=bi, w1=wi, w2=wi, h=hi, units=mm, ntri=400, rot=rot)
center(rot) = meander(a=ac, b=bc, w1=wc, w2=wc, h=hc, units=mm, ntri=400, rot=rot)
substrate = Layer(width=0.1mm, epsr=2.6)
foam(w) = Layer(width=w, epsr=1.05)
rot0 = 0
strata = [
Layer()
outer(rot0)
substrate
foam(t1*1mm)
inner(rot0 - 45)
substrate
foam(t2*1mm)
center(rot0 - 2*45)
substrate
foam(t2*1mm)
inner(rot0 - 3*45)
substrate
foam(t1*1mm)
outer(rot0 - 4*45)
substrate
Layer() ]
steering = (θ=0, ϕ=0)
flist = 11:0.25:19
results = analyze(strata, flist, steering, showprogress=false)
s11rr, s21ll, ar11db, ar21db = eachcol(extract_result(results,
@outputs s11db(R,R) s21db(L,L) ar11db(R) ar21db(L)))
RL = -s11rr
IL = -s21ll
obj = maximum(vcat(RL,IL,ar11db,ar21db))
returnval[1] = obj
if obj < bestf
bestf = obj
open("optimization_best.log", "a") do fid
xround = map(t -> round(t, digits=4), x)
println(fid, round(obj,digits=4), " at x = ", xround, " #", now())
end
end
return (true, true, returnval)
end
end
# x = [period, wo, ho, wi, hi, wc, hc, t1, t2]
xmin = [3.0, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 1.5, 1.5]
xmax = [5.5, 0.35,4.0, 0.35, 4.0, 0.35, 4.0, 6.0, 6.0]
x0 = 0.5 * (xmin + xmax)
nb_inputs = length(x0)
nb_outputs = 7
output_types = ["OBJ"; repeat(["EB"], nb_outputs-1)]
prob = NomadProblem(nb_inputs, nb_outputs, output_types, bb;
lower_bound = xmin,
upper_bound = xmax,
granularity = 1.e-3 * ones(nb_inputs))
isfile("optimization_best.log") && rm("optimization_best.log")
result = solve(prob, x0)
Thanks for looking at it.
Whoops, the comment in my code was incorrect. c1, c2, ...c6 are all to be held less than or equal to zero. I've edited the comment to correct this. I believe the executable code is/was correct.
On Tue, Jun 1, 2021 at 11:39 AM salomonl @.***> wrote:
@simonp0420 https://github.com/simonp0420 Thank you for the code. To be sure, your constraints that you want to satisfy are of the form c(x) >= 0 ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bbopt/NOMAD.jl/issues/39#issuecomment-852357477, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAYNWKNKQFACGRQV5QIBZTTQUSPRANCNFSM44SRE6FQ .
@simonp0420 I apologize for the long delay. I can confirm with your example I am able to reproduce the bug, even if I use my own version of libNomadCInterface
. For the moment, I do not have a solution, as I am not familiar at all with interfacing julia multi-thread with C/C++ language. Apparently, this part of Julia is not yet stabilized, see for example https://github.com/jump-dev/KNITRO.jl/issues/93 or https://github.com/jump-dev/Ipopt.jl/issues/257 .
As an alternative, you can use the Nomad
executable from the command line. Here is an example of such a procedure.
I rewrote your code as a blackbox bb_multithread.jl
in the standard of Nomad
using ArgParse
using PSSFSS
using Dates: now
"""
(success, counteval, [objective, c1, c2, c3, c4, c5, c6]) = bb(x)
x = [period, wo, ho, wi, hi, wc, hc, t1, t2]
ao = bo = ac = bc = period; ai = √2*period, bi = period/√2
constraints to be held nonnegative:
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
"""
function bb(x)
period, wo, ho, wi, hi, wc, hc, t1, t2 = x
ao = bo = ai = bi = ac = bc = period
ai *= √2
bi /= √2
c1 = ho - 0.99*bo
c2 = hi - 0.99*bi
c3 = hc - 0.99*bc
c4 = 2.05*wo - ho
c5 = 2.05*wi - hi
c6 = 2.05*wc - hc
returnval = [5000.0,c1,c2,c3,c4,c5,c6]
any(returnval[2:end] .> 0) && (return (false, false, returnval))
outer(rot) = meander(a=ao, b=bo, w1=wo, w2=wo, h=ho, units=mm, ntri=400, rot=rot)
inner(rot) = meander(a=ai, b=bi, w1=wi, w2=wi, h=hi, units=mm, ntri=400, rot=rot)
center(rot) = meander(a=ac, b=bc, w1=wc, w2=wc, h=hc, units=mm, ntri=400, rot=rot)
substrate = Layer(width=0.1mm, epsr=2.6)
foam(w) = Layer(width=w, epsr=1.05)
rot0 = 0
strata = [
Layer()
outer(rot0)
substrate
foam(t1*1mm)
inner(rot0 - 45)
substrate
foam(t2*1mm)
center(rot0 - 2*45)
substrate
foam(t2*1mm)
inner(rot0 - 3*45)
substrate
foam(t1*1mm)
outer(rot0 - 4*45)
substrate
Layer() ]
steering = (θ=0, ϕ=0)
flist = 11:0.25:19
results = analyze(strata, flist, steering, showprogress=false)
s11rr, s21ll, ar11db, ar21db = eachcol(extract_result(results,
@outputs s11db(R,R) s21db(L,L) ar11db(R) ar21db(L)))
RL = -s11rr
IL = -s21ll
obj = maximum(vcat(RL,IL,ar11db,ar21db))
returnval[1] = obj
return (true, true, returnval)
end
# This blackbox takes in input a file containing the coordinates of the point you want to evaluate...
s = ArgParseSettings()
@add_arg_table s begin
"filename"
required = true
end
parsed_args = parse_args(ARGS, s)
input_values = begin
open(parsed_args["filename"], "r") do file
lines = readline(file)
[parse(Float64, elt) for elt in split(lines, " ")]
end
end
# ... and return on the standard output the outputs of the blackbox...
bb_outputs = bb(input_values)
for elt in bb_outputs[3]
print(elt)
print(" ")
end
println()
# ...with a signal indicating if the evaluation failed or not.
if bb_outputs[1] == true
exit(0)
else
exit(1)
end
In the same folder where bb_multithread.jl
is, provide a param.txt
file which gives the properties of the blackbox.
DIMENSION 9 # number of variables
# Choose N as the number of threads available on your machine.
BB_EXE "$julia $-tN bb_multithread.jl"
BB_OUTPUT_TYPE OBJ EB EB EB EB EB EB
X0 ( 4.25 0.225 2.05 0.225 2.05 0.225 2.05 3.75 3.75 ) # starting point
LOWER_BOUND ( -3.0 0.1 0.1 0.1 0.1 0.1 0.1 1.5 1.56 )
UPPER_BOUND ( 5.5 0.35 4.0 0.35 4.0 0.35 4.0 6.0 6.0 )
GRANULARITY * 0.001
MAX_BB_EVAL 50 # the algorithm terminates when
# 50 black-box evaluations have
# been made
There exists other parameters, and you can get an history of your execution by adding other parameters (see https://nomad-4-user-guide.readthedocs.io/en/latest/Appendix.html).
After typing the following command $NOMAD_EXE_FOLDER/nomad param.txt
, I obtain the following logs (on a linux machine):
All variables are granular. MAX_EVAL is set to 1000000 to prevent algorithm from circling around best solution indefinetely
BBE OBJ
1 11.688 (Phase One) *
2 35.639
3 8.034 *
4 33.974
5 6.807 *
6 9.729
Warning: Evaluator returned exit status 256 for point: ( 3.95 0.235 1.15 0.255 2.85 0.265 2.65 5.95 2.75 )
Warning: Evaluator returned exit status 256 for point: ( 2.65 0.225 1.75 0.235 1.95 0.235 1.95 5.25 3.25 )
7 25.911
8 44.352
Warning: Evaluator returned exit status 256 for point: ( 3.95 0.235 1.15 0.255 2.85 0.265 2.65 5.95 2.75 )
10 9.768
11 7.133
12 9.186
13 49.18
14 10.126
Warning: Evaluator returned exit status 256 for point: ( 2.65 0.225 1.75 0.235 1.95 0.235 1.95 5.25 3.25 )
16 10.025
17 10.705
18 6.313 *
19 6.573
20 6.379
21 36.009
22 8.134
23 7.942
24 6.828
25 7.082
26 6.434
Warning: Evaluator returned exit status 256 for point: ( 3.05 0.235 1.45 0.215 0.25 0.255 1.55 5.95 3.35 )
Warning: Evaluator returned exit status 256 for point: ( 3.05 0.235 1.45 0.215 0.25 0.255 1.55 5.95 3.35 )
Warning: Evaluator returned exit status 256 for point: ( 4.25 0.215 2.95 0.185 3.35 0.205 1.75 5.95 3.35 )
Warning: Evaluator returned exit status 256 for point: ( 3.15 0.245 2.55 0.335 0.65 0.245 1.75 5.95 3.45 )
Warning: Evaluator returned exit status 256 for point: ( 4.25 0.215 2.95 0.185 3.35 0.205 1.75 5.95 3.35 )
Warning: Evaluator returned exit status 256 for point: ( 3.15 0.245 2.55 0.335 0.65 0.245 1.75 5.95 3.45 )
30 6.912
31 6.345
32 5.637 *
33 10.597
34 10.232
35 6.892
36 5.012 *
37 5.059
38 6.53
39 5.615
Warning: Evaluator returned exit status 256 for point: ( 3.15 0.225 1.85 0.195 0.15 0.275 1.45 4.85 3.45 )
Warning: Evaluator returned exit status 256 for point: ( 3.15 0.225 1.85 0.195 0.15 0.275 1.45 4.85 3.45 )
Warning: Evaluator returned exit status 256 for point: ( 2.65 0.225 0.85 0.125 0.65 0.285 2.85 5.85 3.35 )
41 22.831
Warning: Evaluator returned exit status 256 for point: ( 2.65 0.225 0.85 0.125 0.65 0.285 2.85 5.85 3.35 )
43 5.378
44 6.46
45 5.146
46 5.801
47 4.197 *
Warning: Evaluator returned exit status 256 for point: ( 3.15 0.165 3.15 0.255 1.65 0.335 1.85 1.55 3.45 )
Warning: Evaluator returned exit status 256 for point: ( 3.15 0.165 3.15 0.255 1.65 0.335 1.85 1.55 3.45 )
BBE OBJ
49 6.31
50 3.78 *
Reached stop criterion: Max number of blackbox evaluations (Eval Global) 50
A termination criterion is reached: Max number of blackbox evaluations (Eval Global) Success found and opportunistic strategy is used 50
Best feasible solution: #5197 ( 3.35 0.155 1.95 0.135 0.85 0.325 1.75 4.35 3.45 ) Evaluation OK f = 3.78035 h = 0
Best infeasible solution: Undefined.
Blackbox evaluations: 50
Total sgte evaluations: 5032
Cache hits: 6
Total number of evaluations: 56
This is not a silver bullet, but I hope it will temporarily help you.
Thanks for looking at this and providing a workaround. I'm glad you were able to confirm the issue, as this is the first time I've ever used mutithreading in any language and I suspected that the problem might be with my use of multithreading. I've used argparse before and I know that it adds its own substantial overhead to the already significant startup time for Julia. These two considerations plus the need to modify the source and create the parameters file make your workaround less attractive to me than simply running the original code in a single-threaded Julia session. I'll plan on doing this until the time when the threading interface settles down and the NOMAD and Julia can multithread harmoniously. Thanks again for the significant effort you put into looking at this.
Not exactly this, but for me nomad.jl segfaults when doing allocation in parallel loops (@thread for i in ...
). A simple solution i found was executing my parallel function in another process using Distributed.jl
. It's not very elegant but works fine w/o the large performance/memory penalty that using Distributed.jl (instead of threads) entails for my workload(lots of data transfer).
@SobhanMP Thanks for the workaround tip. Perhaps the maintainers of NOMAD may have other ideas, but it sounds like you might be able to provide a simple minimal working example (MWE) of how using @threads causes a segfault with NOMAD. If so, would you consider posting it here? I was unable to generate a MWE. It may help the maintainers with debugging the problem.
using Base.Threads
using NOMAD
using LinearAlgebra
n = 5
A = randn(n, n)
function f(x)
y = fill(0.0, nthreads())
@threads for i in eachindex(x)
for j in 1:100
g = rand(10, 10)
end
y[threadid()] = x[i]
end
(true, true, [sum(y)])
end
pb = NomadProblem(n,
1,
["OBJ"],
f;
upper_bound=[100.0 for _ in 1:n],
lower_bound=[0.0 for _ in 1:n])
pb.options.max_bb_eval = 3
result = @time NOMAD.solve(pb, rand(n))
display(result)
gives
Caught seg fault in thread 0
Caught seg fault in thread 0
terminate called after throwing an instance of 'Caught seg fault in thread Caught seg fault in thread 0
0
terminate called recursively
terminate called recursively
signal (6): Aborted
in expression starting at file.jl:25
signal (6): Aborted
in expression starting at file.jl:25
terminate called recursively
NOMAD_4_0_0::Exception'
signal (6): Aborted
in expression starting at file.jl:25
what(): NOMAD::Exception thrown (/workspace/srcdir/nomad/src/Algos/Step.cpp, 103) Caught seg fault
signal (6): Aborted
in expression starting at file.jl:25
gsignal at /lib64/libc.so.6 (unknown line)
malloc(): unaligned tcache chunk detected
gsignal at /lib64/libc.so.6 (unknown line)
signal (6): Aborted
in expression starting at file.jl:25
abort at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
gsignal at /lib64/libc.so.6 (unknown line)
malloc(): unaligned tcache chunk detected
signal (6): Aborted
in expression starting at file.jl:25
when ran with
julia -t 5 file.jl
commenting the loop in line 10-12 that causes allocation and garbage collection(i suspect this is culprit), makes it work.
I'm using julia 1.6.1 from the julia website on gentoo!
@SobhanMP thanks for posting your example. Hopefully this will be easier for the developers to test and debug than my case.
this should be closed, it's no longer an issue with julia/version-1.8 branch
I'm still seeing the same behavior for my application, both on 1.7.3 and 1.8-rc1: Works fine with Julia multithreading under Windows, but segfaults on Manjaro if I start Julia with more than a single thread.
the MVE i had no longer breaks down using the git version (the branch version-1.8 not 1.8-rc1)
my bad, seems like it just takes a bit longer to segfault
@simonp0420 can you try my fork of NOMAD.jl (you can dev it by removing NOMAD.jl and running
]dev https://github.com/SobhanMP/NOMAD.jl
It's working! So far it's completed four evaluations of the objective function, running with 8 threads! I will continue to let it run and report back tomorrow. But things are looking very good so far :-)
: )
Still running (in the end game, I think). It has run without error for over 17 hours, with more than 3000 evaluations of the objective function. Congratulations! Great work! :clap: :clap: :clap:
Fixed by PR 59. Many thanks!
Firstly, thanks for making this great optimizer available to Julia users!
I have an expensive objective function that takes about 20 seconds to evaluate with threading enabled in Julia. When I try to optimize with NOMAD on Manjaro Linux, starting Julia with -t2 -t3, etc., on my 8-core machine, I get the following error (-t1 works fine, though slowly):
This error does not occur on my Windows machine. Here is my configuration:
I'm using NOMAD.jl v. 2.1.0.
Actually, from looking at the information my objective function writes out, it looks like the segfault is occurring in the objective function, presumably when the first Threads.@threads statement is encountered. But, as I noted previously, this error doesn't occur on my Windows machine, where I'm using 8 threads.