anyoptimization / pymoo

NSGA2, NSGA3, R-NSGA3, MOEAD, Genetic Algorithms (GA), Differential Evolution (DE), CMAES, PSO
https://pymoo.org
Apache License 2.0
2.27k stars 389 forks source link

What is the right way to skip none solutions in each generation. #474

Closed miraclema999 closed 11 months ago

miraclema999 commented 1 year ago

I am using pymoo linking with simulation tools to optimize building forms to find near pareto front solutions accounting for daylighting and energy consumption, and the data exchange is by csv files, the core code run in Rhinoceros. Since the simulation tools are not steady, there will be individual solutions that cannot retrieve one or two objective value ( if I run the simulation again the value may be retrieved). So I mark these objective values to be 'None'. To handle situations where certain objective values are marked as 'None' due to the nature of my simulation tools, and I want to avoid discarding these solutions entirely, I should implement a mechanism to keep track of these solutions and give them a chance in the subsequent generations. My code is below, I use Vectorized Matrix Operation for parallelization, and the optimization is step by step, controled by the existence of an external txt file. I also use a callback to store solutions of every generation. The question is:

  1. I use the code if np.any(fitness_values == None): continue to skip solutions, is it the right way?
  2. I print the fitness_values retrieved from result.X, to find the objective values are different from the ones recorded by the callback. The values in the callback is right, but the values of each generation retrieved from result.X seems wired (That is why I suspect my code is not a right way out). The difference is like this: If I manually set objective of the 5th solution to be None. The objective values of the first four solutions retrieved from the two ways are identical, but the objective value of the 5th solution retrieved from result.X is not None. And from the 5th to the 50th solution, the values retrieved from result.X is wrong.
import os
import time
import csv
import threading
import numpy as np
from pymoo.core.problem import Problem
from pymoo.core.problem import ElementwiseProblem
from pymoo.termination import get_termination
from pymoo.algorithms.moo.nsga2 import NSGA2
from pymoo.operators.crossover.sbx import SBX
from pymoo.operators.mutation.pm import PM
from pymoo.operators.sampling.rnd import FloatRandomSampling
from pymoo.optimize import minimize
import matplotlib.pyplot as plt
from pymoo.core.population import Population
from pymoo.core.evaluator import Evaluator
from pymoo.core.solution import Solution

n_var=5
n_obj=3
n_ieq_constr=0
xl=[8,0,0,0,0.2]
xu=[37,1,1,1,0.95]
pop_size=50
n_gen=60
seed=1
sleep_time=2

running_file_path = "D:/Zhu/Optimize/Hops/running.txt"

def is_running():
    return os.path.exists(running_file_path)

class TestProblem(Problem):
    def __init__(self, **kwargs):
        super().__init__(
                         n_var=n_var,
                         n_obj=n_obj,
                         n_ieq_constr=n_ieq_constr,
                         xl=xl,
                         xu=xu,
                         )

    def _evaluate(self, x, out, *args, **kwargs):
        # Step 1: Send data to Grasshopper
        data=np.array(x)
        file_path = "D:/Zhu/Optimize/Hops/Params.csv"  # Set the file path
        np.savetxt(file_path, data, delimiter=",", fmt="%f")

        # Step 2: Wait for result in Python
        result_file_path = "D:/Zhu/Optimize/Hops/Res.csv"  # Path to the result file
        initial_timestamp = os.path.getmtime(result_file_path)
        while True:
            time.sleep(sleep_time)  # Wait for 1 second
            current_timestamp = os.path.getmtime(result_file_path)
            if current_timestamp != initial_timestamp:
                data = np.genfromtxt(result_file_path, delimiter=',')
                if data.shape==(pop_size,n_obj):
                    break
        # Step 3: Read result files in Python
        data = np.genfromtxt(result_file_path, delimiter=',')
        # Check the running file to decide whether continue
        data[data==999]=None
        out["F"]=data

class MyCallback:

    def __init__(self) -> None:
        super().__init__()
        self.sols = []

    def __call__(self, sols):
        self.sols.extend(sols)
        print()

problem=TestProblem()

algorithm = NSGA2(
    pop_size=pop_size,
    sampling=FloatRandomSampling(),
    crossover=SBX(prob=0.9, eta=15),
    mutation=PM(eta=20),
    eliminate_duplicates=True,
    evaluator=Evaluator(callback=MyCallback())
    )

termination = get_termination("n_gen", n_gen)

if Run:
    open(running_file_path,'w')
    algorithm.setup(problem)
    while is_running():
        algorithm.next()
        if algorithm.termination.has_terminated():
            break
        result=algorithm.result()
        optimal_solutions = result.X
        optimal_objectives = result.F

        for solution, fitness_values in zip(optimal_solutions, optimal_objectives):
            # If the solution has missing objectives marker, skip processing
            print(fitness_values)
            if np.any(fitness_values == None):
                continue

    sols = Population.create(*algorithm.evaluator.callback.sols)

    X_history=np.array(sols[0].X)
    F_history=np.array(sols[0].F)
    for i in range(len(sols)-1):
        X_history=np.append(X_history,sols[i+1].X)
        F_history=np.append(F_history,sols[i+1].F)
    X_history=X_history.tolist()
    F_history=F_history.tolist()
blankjul commented 1 year ago

What about simply setting the objective(s) to a very large value, e.g. 1e12 or so? I am not exactly what you are trying to do in your code, but for the algorithm, it would be also okay to consider a solution simply as infeasible if the evaluation fails (also referred to as death penalty). This might be the cleanest way of implementing this.

If you know before running your third-party code a solution will fail, you can also filter them out beforehand.

miraclema999 commented 1 year ago

The fact is, bacause of the insteability of the simulation software, it is impossible to decide which solution will fail. And the solution of the same variables may be OK if I run it again. So if I set the objectives to a very large value, I wonder the algorithm will not consider the solutions in the following generations, and that will affect the final search space I think. I am not sure my assumption is right or wrong? If the large penalty of these solutions doesn't affect the final searching space, I think it is a cleanest way.

blankjul commented 1 year ago

It sounds to me that your problem is not that sometimes you don't have a solution (because the simulation crashes) but more that your evaluation function is non-deterministic. You are right that NSGA-II makes the assumption that you have a deterministic function and will not re-evaluate a solution for this reason.

What about adding retry if it fails and just assuming after n attempts it truly fails?

jacktang commented 1 year ago

Hello @miraclema999 , I would suggest you to create surrogate model for the simulation if possible.

miraclema999 commented 1 year ago

What about adding retry if it fails and just assuming after n attempts it truly fails?

Sorry for the late reply. I agree with your suggestion. This way will not affect the number of solutions in each generation. To be honest, I'm not doing well in scripting, can you tell me how to modify this script, to let it retry the generation? Thank you

miraclema999 commented 1 year ago

Hello @miraclema999 , I would suggest you to create surrogate model for the simulation if possible.

Thank you bro, did you mean that I switch to another algorithm?

jacktang commented 1 year ago

@miraclema999 I meant creating surrogate model with the samples from simulation software and replace it, and finally verify the result with simulation software. Here is the explanation of surrogate model on wikipedia. And here are some mainly steps:

  1. Generate samples using simulation software
  2. Create surrogate models by using the samples, and validate/test the surrogate model to make sure the quality
  3. Replace the simulation by surrogate model in optimization program
  4. Validate the optimization result in simulation software.