anyoptimization / pymoo

NSGA2, NSGA3, R-NSGA3, MOEAD, Genetic Algorithms (GA), Differential Evolution (DE), CMAES, PSO
https://pymoo.org
Apache License 2.0
2.25k stars 386 forks source link

Bad scaling using paralellization when applied to the Mazda car design benchmark problem #266

Closed dietmarwo closed 2 years ago

dietmarwo commented 2 years ago

pymoo is an excellent new library, there are not many Python open source libraries supporting multiple objectives, constraints and parallel function evaluation. What I like most is that selection, crossover, mutation and eliminate_duplicates are uniformly defined for most population based algorithms. So you can support mixed integer problems for all of them.

I successfully applied pymoo to https://github.com/zi-w/Ensemble-Bayesian-Optimization/blob/master/test_functions/rover_function.py Both Differential Evolution and GA beat the Bayesean optimization result computed using 240 cluster cores from https://arxiv.org/pdf/1706.01445.pdf in a few seconds. I have no clue why they have choosen a problem not well suited for surrogate based methods.

But then I tried to "stretch" pymoo to its limits by applying it to http://ladse.eng.isas.jaxa.jp/benchmark/. I wrapped their C++ code using ctypes to make it callable from Python. It is a 222 dimensional (discrete decision variables) problem with 2 objectives and 54 constraints.

Diagram 8 from https://www.jstage.jst.go.jp/article/tjpnsec/9/2/9_86/_pdf/-char/en shows pareto fronts a good optimizer should be able to compute.

It seems only NSGA2 works (more or less). Best configuration I found was

    nsga2 = NSGA2(
        pop_size=768,
        n_offsprings=10,
        sampling=get_sampling("int_random"),
        crossover=get_crossover("int_sbx", prob=0.9, eta=15),
        mutation=get_mutation("int_pm", eta=20),
        eliminate_duplicates=True
    )

When I use

pool = multiprocessing.Pool(threads)

I get: "AttributeError: Can't pickle local object 'check_pymoo..MazdaProblem" exception.

pool = ThreadPool(threads)
problem = MazdaProblem(runner=pool.starmap, func_eval=starmap_parallelized_eval)

works, but it scales terribly on my AMD5950 16 core CPU:

263 function evals per second using 1 thread 550 function evals per second using 8 threads 600 function evals per second using 16 threads 603 function evals per second using 32 threads 2 x 585 = 1170 function evals per second 2x16 threads 4 x 486 = 1944 function evals per second 4x8 threads

This means the CPU is best utilized running 4 experiments in parallel each using 8 threads.

For a non-pymoo optimizer using a different threading model (this optimizer calls exactly the same Python objective function from C++ using ctypes and uses C++ threading) I get:

397 function evals per second using 1 thread 2980 function evals per second using 8 threads 5750 function evals per second using 16 threads 6200 function evals per second using 32 threads 2 x 2450 = 7350 function evals per second 2x16 threads 4 x 2165 = 8660 function evals per second 4x8 threads

Using more than 16 threads for one run makes no sense, but we have 5750 evals compared to 600 evals per second using 16 threads, almost factor 10. The optimizer overhead is different here, but this is only about factor 397/263 = 1.5. Utilizing hyperthreading helps, but you have to execute 2 experiments in parallel to further increase the overall eval rate.

At a time EPYC CPUs are approching 128 cores / 256 threads a 16 core mainsteam CPU should be supported properly.

I tried two more optimizers:

    algorithm2 = AGEMOEA(
        pop_size=512,
        n_offsprings=10,
        sampling=get_sampling("int_random"),
        crossover=get_crossover("int_sbx", prob=0.9, eta=15),
        mutation=get_mutation("int_pm", eta=20),
        eliminate_duplicates=True        
    )

After about 30000 generations the eval rate slows down significantly, it is < 100 evals/second after 50000 generations. The Mazda problem requires at least 400000 generations. Beside that the results are worse than NSGAII after 50000 generations.

    algorithm3 = CTAEA(ref_dirs=ref_dirs,
        sampling=get_sampling("int_random"),
        crossover=get_crossover("int_sbx, prob=0.9, eta=15"),
        mutation=get_mutation("int_pm", eta=20),
        eliminate_duplicates=True
    )

CTAEA is not able to fulfill the 54 constraints of the problem. It soon founds infeasable solutions with very good objective values - which cannot be reached when the constraints are fulfilled. It seems to ignore the contraints completely.

I tried to create a simpler problem to reproduce the issue but succedded only partially:

class MyProblem(ElementwiseProblem):

    def __init__(self, n, **kwargs):
        super().__init__(n_var=2,
                         n_obj=2,
                         n_constr=n,
                         xl=np.array([-2,-2]),
                         xu=np.array([2,2]), **kwargs)
        np.random.seed(1)
        self.n = n
        self.p = np.random.rand(n,5)*2+1

    def _evaluate(self, x, out, *args, **kwargs):
        f1 = 100 * (x[0]**2 + x[1]**2)
        f2 = (x[0]-1)**2 + x[1]**2
        p = self.p
        g = [ p[i,0]*(x[0] + p[i,1]) * p[i,2]*(x[1] - p[i,3] / p[i,4]) for i in range(self.n)]

        out["F"] = [f1, f2]
        out["G"] = g

With n=20, popsize=40 I find feasible solutions, but the pareto front has a gigantic hole compared to the NSGA2 result.

Do you have recommendations which algorithm I could try? Did I miss something regarding configuration? Are CTAEA/AGEMOEA only for low dimensional / low number of constraints problems?

blankjul commented 2 years ago

Hi Dietmar and thanks for using pymoo! Great that you have already experimented with different algorithms, as this would have been my first advice.

1. Serialization error:

"AttributeError: Can't pickle local object 'check_pymoo..MazdaProblem" exception.

This error is usually thrown if something in the Problem object can not be serialized. Do you have save_history=True? I think I talked to someone with a similar issue where this was causing it.

Otherwise, you can exclude an attribute to be serialized in the problem's constructor by :

self.exclude_from_serialization = self.exclude_from_serialization + ["attribute"]

2. Scaling: In your scaling study, have you also played with n_offsprings? Because this is the number of solutions that can truly be parallelized when the whole offspring population is evaluated. If yes, how many physical cores does your machine have? You can also try using process instead of threads and see if this performs better.

3. Constrained Handling: You have already chosen algorithms that have been proposed to handle constraints. During the last few years I was (and I am still) hoping someone who does research related to constrained handling might want to contribute a few more methods.

For an application problem, I would start by analyzing what constraints can not be satisfied. Also constraint normalization can be helpful if their scales are entirely different.

I hope this was helpful. Let me know if you would like to have a more detailed discussion (maybe in a meeting).