anyoptimization / pymoo

NSGA2, NSGA3, R-NSGA3, MOEAD, Genetic Algorithms (GA), Differential Evolution (DE), CMAES, PSO
https://pymoo.org
Apache License 2.0
2.28k stars 392 forks source link

save_history=True does not save all function evaluations (for at least some problem settings) #446

Closed thchang closed 1 year ago

thchang commented 1 year ago

I ran the following script on my laptop (and several other environments), and it prints that the length of my final history is only 64. I ran 10 generations with a pop size 10, I was expecting to see a history of length 100. From the intermediate print statements, I can see that 100 evaluations were taken.

Is there a way to query the entire history (every function evaluation ever taken)?

import numpy as np 
from pymoo.algorithms.moo.nsga2 import NSGA2, RankAndCrowdingSurvival
from pymoo.core.mixed import MixedVariableGA, MixedVariableMating, MixedVariableSampling, MixedVariableDuplicateElimination
from pymoo.optimize import minimize
from pymoo.core.problem import ElementwiseProblem
from pymoo.core.variable import Real, Integer, Choice, Binary

class TestProblem(ElementwiseProblem):

    def __init__(self, **kwargs):
        variables = {
            "x1": Real(bounds=(0.0, 1.0)),
            "c1": Choice(options=["good", "med", "bad"]),
            "b1": Binary(),
            "i1": Integer(bounds=(0, 4)),
        }
        super().__init__(vars=variables, n_obj=2, n_ieq_constr=0, **kwargs)
        self.count = 0

    def _evaluate(self, X, out, *args, **kwargs):
        sx = np.zeros(2)
        base = 0.0 
        if X['c1'] == "med":
            base += 1
        elif X['c1'] == "bad":
            base += 2
        if X['b1']: 
            base += 1
        base += np.abs(X['i1'] - 2)
        sx[0] = base + X['x1']**2
        sx[1] = base + (1.0 - X['x1'])**2
        out["F"] = sx.copy()
        self.count += 1
        print(f"finished evaluation {self.count}: {sx}")

# Fix random seed
SEED = 0

# Solve w/ NSGA-II with MixedVariable settings in pymoo (10 gens with pop size 10 = 100 evals)
algorithm = NSGA2(pop_size=10,
                  sampling=MixedVariableSampling(),
                  mating=MixedVariableMating(eliminate_duplicates=MixedVariableDuplicateElimination()),
                  eliminate_duplicates=MixedVariableDuplicateElimination(),
                  #survival=RankAndCrowdingSurvival()
                  )
res = minimize(TestProblem(),
               algorithm,
               ("n_gen", 10),
               save_history=True,
               seed=SEED,
               verbose=False)

# Extract all objective values
obj_vals = []
for row in res.history:
    for fi in row.result().F:
        obj_vals.append(fi)
obj_vals = np.asarray(obj_vals)

# Dump results
print(f"Len history: {len(res.history)} gens")
print(f"Num evals in history: {len(obj_vals)}")
blankjul commented 1 year ago

The save_history saves the algorithm object in each iteration. Then, later each result() function call returns the object that would have been returned if the algorithm would have terminated. The result, however, is not directly correlated to the function calls.

You can provide a callback to the evaluator though:

import numpy as np
from pymoo.algorithms.moo.nsga2 import NSGA2, RankAndCrowdingSurvival
from pymoo.core.evaluator import Evaluator
from pymoo.core.mixed import MixedVariableGA, MixedVariableMating, MixedVariableSampling, \
    MixedVariableDuplicateElimination
from pymoo.core.population import Population
from pymoo.optimize import minimize
from pymoo.core.problem import ElementwiseProblem
from pymoo.core.variable import Real, Integer, Choice, Binary

class TestProblem(ElementwiseProblem):

    def __init__(self, **kwargs):
        variables = {
            "x1": Real(bounds=(0.0, 1.0)),
            "c1": Choice(options=["good", "med", "bad"]),
            "b1": Binary(),
            "i1": Integer(bounds=(0, 4)),
        }
        super().__init__(vars=variables, n_obj=2, n_ieq_constr=0, **kwargs)
        self.count = 0

    def _evaluate(self, X, out, *args, **kwargs):
        sx = np.zeros(2)
        base = 0.0
        if X['c1'] == "med":
            base += 1
        elif X['c1'] == "bad":
            base += 2
        if X['b1']:
            base += 1
        base += np.abs(X['i1'] - 2)
        sx[0] = base + X['x1'] ** 2
        sx[1] = base + (1.0 - X['x1']) ** 2
        out["F"] = sx.copy()
        self.count += 1
        print(f"finished evaluation {self.count}: {sx}")

class MyCallback:

    def __init__(self) -> None:
        super().__init__()
        self.sols = []

    def __call__(self, sols):
        self.sols.extend(sols)
        print()

algorithm = NSGA2(pop_size=10,
                  sampling=MixedVariableSampling(),
                  mating=MixedVariableMating(eliminate_duplicates=MixedVariableDuplicateElimination()),
                  eliminate_duplicates=MixedVariableDuplicateElimination(),
                  evaluator=Evaluator(callback=MyCallback())
                  )

res = minimize(TestProblem(),
               algorithm,
               ("n_gen", 10),
               seed=0,
               copy_algorithm=False,
               verbose=False)

sols = Population.create(*algorithm.evaluator.callback.sols)

print(f"Len Sols: {len(sols)} gens")

This will then get triggered whenever a solution solution(s) are evaluated.

thchang commented 1 year ago

@blankjul thanks for the reply, this makes sense. To keep moving forward, I had just added a line to my _evaluate() method's definition to dump the result to a csv file.

As a note for your group, it might be nice to eventually offer explicit support for saving all evaluations --- It makes performance comparisons easier, and is also helpful for applications where the dataset produced by the optimization has intrinsic value, beyond just finding the Pareto solutions

blankjul commented 1 year ago

@thchang agree that this can be useful for some problems. However, for many problems storing each evaluation will easily mean storing thousands of evaluations. For not very computationally expensive (it seems like your one is) I would usually recommend having an Archive (maybe of size 1000 or 10000) to limit the overall size.

The hack you are using is the most simple fix for this by the way (if you only care about storing the X values and not individuals).