Parallel Latin hypercube search

bbopt / nomad

NOMAD - A blackbox optimization software

https://nomad-4-user-guide.readthedocs.io/

GNU Lesser General Public License v3.0

117 stars 24 forks source link

Parallel Latin hypercube search #68

Open SobhanMP opened 2 years ago

SobhanMP commented 2 years ago

Hi, i have a nomad file that goes roughly like

DIMENSION 17
NB_THREADS_OPENMP 18
BB_EXE "$./a.out"
BB_OUTPUT_TYPE OBJ
BB_INPUT_TYPE * R
LH_SEARCH 16 10
LOWER_BOUND * -10
UPPER_BOUND *  10

The first iteration of the LH loop (in this case 16 evals) doesn't seem to be parallelized. am i doing something wrong or is this working as as intended?

ctribes commented 2 years ago

When testing with Linux (build with gcc 11) and OSX (build with gcc 11), evaluations are done in parallel for LH. When using OSX (build with Clang 13) OpenMP is not available and evaluations are done sequentially.

What is your plateform and compiler ?

SobhanMP commented 2 years ago

i'm on linux, Version 4.2.0 Release. Using OpenMP. Using SGTELIB. i compiled it with gcc 11.1.0. Just to be clear, it's the first 16(or something around that) evals that are done sequentially, the rest are done in parallel.

ctribes commented 2 years ago

The behaviour you observe is not intended. I cannot reproduce this behavior on Linux with gcc. I observe that the number of parallel evaluations is not maximal but there is more than one for most of the time during LH initial search. I tested with a single evaluation lasting for ~8 seconds.

Another approach for parallel evaluations consists in sending block of points for evaluation. https://github.com/bbopt/nomad/tree/master/examples/basic/batch/single_obj_parallel

SobhanMP commented 2 years ago

i just re build nomad from the git same behavior,

DIMENSION 10
NB_THREADS_OPENMP 4
BB_EXE "$julia ./a.jl"
BB_OUTPUT_TYPE OBJ
BB_INPUT_TYPE * R
LH_SEARCH 4 5
LOWER_BOUND * -10
UPPER_BOUND *  10

#!/usr/bin/env julia

now = time()
let c = 0
    while time() - now < 30
        A = randn(100, 100)
        c += sum(A)
    end
    println(c)
end

there is only one active core for first ~2 minutes. I'm not saying that LH doesn't use all of the cores, i'm saying that it doesn't use it in the first iteration of the LH search (first 4 points in this case). It starts using 4 cores after the first few (4?) evals.

ctribes commented 2 years ago

Thanks for the test. I am able to reproduce the behaviour. The initial LH point evaluations are not parallelized. I have also tested without NB_THREADS_OPENMP and LH_SEARCH 10 0 and parallel evaluations start after 10 points submitted. I will investigate what can cause that.

ctribes commented 2 years ago

I found the cause of this behaviour. At some point we decided to move the initialization step of an algo out of the OpenMP parallel loop because it could be problematic. I am not sure that it is still the case. I need to test it more thoroughly. Anyway, I patched a version of the MainStep.cpp file in which the initialization step is in the OpenMP parallel loop. You can test it. https://github.com/bbopt/nomad/tree/fix/LH_init_parallel