Bad scaling using paralellization when applied to the Mazda car design benchmark problem

pymoo is an excellent new library, there are not many Python open source libraries supporting multiple objectives, constraints and parallel function evaluation. What I like most is that selection, crossover, mutation and eliminate_duplicates are uniformly defined for most population based algorithms. So you can support mixed integer problems for all of them.

I successfully applied pymoo to https://github.com/zi-w/Ensemble-Bayesian-Optimization/blob/master/test_functions/rover_function.py Both Differential Evolution and GA beat the Bayesean optimization result computed using 240 cluster cores from https://arxiv.org/pdf/1706.01445.pdf in a few seconds. I have no clue why they have choosen a problem not well suited for surrogate based methods.

But then I tried to "stretch" pymoo to its limits by applying it to http://ladse.eng.isas.jaxa.jp/benchmark/. I wrapped their C++ code using ctypes to make it callable from Python. It is a 222 dimensional (discrete decision variables) problem with 2 objectives and 54 constraints.

Diagram 8 from https://www.jstage.jst.go.jp/article/tjpnsec/9/2/9_86/_pdf/-char/en shows pareto fronts a good optimizer should be able to compute.

It seems only NSGA2 works (more or less). Best configuration I found was

    nsga2 = NSGA2(
        pop_size=768,
        n_offsprings=10,
        sampling=get_sampling("int_random"),
        crossover=get_crossover("int_sbx", prob=0.9, eta=15),
        mutation=get_mutation("int_pm", eta=20),
        eliminate_duplicates=True
    )

When I use

pool = multiprocessing.Pool(threads)

I get: "AttributeError: Can't pickle local object 'check_pymoo..MazdaProblem" exception.

pool = ThreadPool(threads)
problem = MazdaProblem(runner=pool.starmap, func_eval=starmap_parallelized_eval)

works, but it scales terribly on my AMD5950 16 core CPU:

263 function evals per second using 1 thread 550 function evals per second using 8 threads 600 function evals per second using 16 threads 603 function evals per second using 32 threads 2 x 585 = 1170 function evals per second 2x16 threads 4 x 486 = 1944 function evals per second 4x8 threads

This means the CPU is best utilized running 4 experiments in parallel each using 8 threads.

For a non-pymoo optimizer using a different threading model (this optimizer calls exactly the same Python objective function from C++ using ctypes and uses C++ threading) I get:

397 function evals per second using 1 thread 2980 function evals per second using 8 threads 5750 function evals per second using 16 threads 6200 function evals per second using 32 threads 2 x 2450 = 7350 function evals per second 2x16 threads 4 x 2165 = 8660 function evals per second 4x8 threads

Using more than 16 threads for one run makes no sense, but we have 5750 evals compared to 600 evals per second using 16 threads, almost factor 10. The optimizer overhead is different here, but this is only about factor 397/263 = 1.5. Utilizing hyperthreading helps, but you have to execute 2 experiments in parallel to further increase the overall eval rate.

At a time EPYC CPUs are approching 128 cores / 256 threads a 16 core mainsteam CPU should be supported properly.

I tried two more optimizers:

    algorithm2 = AGEMOEA(
        pop_size=512,
        n_offsprings=10,
        sampling=get_sampling("int_random"),
        crossover=get_crossover("int_sbx", prob=0.9, eta=15),
        mutation=get_mutation("int_pm", eta=20),
        eliminate_duplicates=True        
    )

After about 30000 generations the eval rate slows down significantly, it is < 100 evals/second after 50000 generations. The Mazda problem requires at least 400000 generations. Beside that the results are worse than NSGAII after 50000 generations.

    algorithm3 = CTAEA(ref_dirs=ref_dirs,
        sampling=get_sampling("int_random"),
        crossover=get_crossover("int_sbx, prob=0.9, eta=15"),
        mutation=get_mutation("int_pm", eta=20),
        eliminate_duplicates=True
    )

CTAEA is not able to fulfill the 54 constraints of the problem. It soon founds infeasable solutions with very good objective values - which cannot be reached when the constraints are fulfilled. It seems to ignore the contraints completely.

I tried to create a simpler problem to reproduce the issue but succedded only partially:

class MyProblem(ElementwiseProblem):

    def __init__(self, n, **kwargs):
        super().__init__(n_var=2,
                         n_obj=2,
                         n_constr=n,
                         xl=np.array([-2,-2]),
                         xu=np.array([2,2]), **kwargs)
        np.random.seed(1)
        self.n = n
        self.p = np.random.rand(n,5)*2+1

    def _evaluate(self, x, out, *args, **kwargs):
        f1 = 100 * (x[0]**2 + x[1]**2)
        f2 = (x[0]-1)**2 + x[1]**2
        p = self.p
        g = [ p[i,0]*(x[0] + p[i,1]) * p[i,2]*(x[1] - p[i,3] / p[i,4]) for i in range(self.n)]

        out["F"] = [f1, f2]
        out["G"] = g

With n=20, popsize=40 I find feasible solutions, but the pareto front has a gigantic hole compared to the NSGA2 result.

Do you have recommendations which algorithm I could try? Did I miss something regarding configuration? Are CTAEA/AGEMOEA only for low dimensional / low number of constraints problems?

anyoptimization / pymoo

Bad scaling using paralellization when applied to the Mazda car design benchmark problem #266