Manipulate solution before saving it as parent

ahmedfgad / GeneticAlgorithmPython

Source code of PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

https://pygad.readthedocs.io

BSD 3-Clause "New" or "Revised" License

1.89k stars 464 forks source link

Manipulate solution before saving it as parent #271

Open Overdrivr opened 10 months ago

Overdrivr commented 10 months ago

Hi,

Thank you for this wonderful lib. I'm experimenting with PyGAD on a scheduling problem, and I'm facing a bit of difficulty.

My fitness function performs a multi-agent simulation (using python lib "mesa"), takes an input planning and returns a single score.

The challenge I'm facing is that the input (="theoretical") planning (that's provided by PyGad to the fitness function) is not usable 1:1 by the agents. For instance, if the planning tells the agents to do something that's impossible (for instance, starting operation B before operation A was complete), the agent will skip the operation and move to the next doable one.

At this point, the challenge I'm facing is that pygad converges to a solution, but it's really bad, and much worse than a planning generated with basic heuristics.

I have a couple of questions about this :

My simulation can return the "real" planning, derived from the input one. I have the intuition that due to the fact that the planning executed by agents is different from the one provided as input solution, this really breaks the optimisation process. I have the hypothesis that if I could modify the selected parent after each generation to edit its gene and replace the theoretical planning by the one actually executed by the agents, this would drastically improve the performance of the convergence. Is this doable ? I saw there's a post parent selection hook, but I'd probably need a pre-parent selection hook do this.
I'm a bit new to genetic algorithms, I do have experience using optimizers though (but ones that required gradients to perform, like L-BGFS and the like). On your experience, is GA a likely candidate for the type of problem I'm trying to solve here ? (scheduling problems with constraints and non linearities)

Thanks a lot for your help!

ahmedfgad commented 10 months ago

Hi @Overdrivr,

Thanks for using PyGAD!

Here are the answers to your questions:

Yes, this is doable. PyGAD has a lifecycle that allows you to monitor and edit a lot of things. Check the lifecycle at this link. If you want to edit the parent, then you could implement the on_parents() function/method to make whatever edits you like.
Gradient-based optimizers work only when the problem is formulated as a differentiable function. For your case, you only have a set of constraints, not a function. So, a population-based optimizer like the genetic algorithm is a good choice. If you search about this topic, I am sure you would find something about scheduling using GA.

Overdrivr commented 9 months ago

Hi @ahmedfgad, thanks for your reply! I tried already to mess around with the on_parents() hook, but I could not find where the selected parents where stored on the GA instance (to be able to replace them by my own). Could you point me in the right direction ?

ahmedfgad commented 9 months ago

It is straightforward.

on_parents() accepts:

The pygad.GA instance. Use it to retrieve whatever you want from the GA.
The parents.

You only need to edit the second argument. After you finish, you have to return:

The selected parents.
The parents' indices. Such indices are usable only if the parents are selected from the population as they are. In your case, you will change the parents and so you can set the indices to anything.

This is an example. It changes all the parents to [999, 999, 999, 999, 999, 999]. The indices are set to zeros. If you run the code, the fitness plot will be just a single line. This is because the new static parents kills the evolution.

import pygad
import numpy

function_inputs = [4,-2,3.5,5,-11,-4.7] # Function inputs.
desired_output = 44 # Function output.

def fitness_func(ga_instance, solution, solution_idx):
    output = numpy.sum(solution*function_inputs)
    fitness = 1.0 / (numpy.abs(output - desired_output) + 0.000001)
    return fitness

last_fitness = 0
def on_parents(ga_instance, parents):
    for idx in range(parents.shape[0]):
        parents[idx, :] = [999]*parents.shape[1]
    indices = [0]*parents.shape[0]
    return parents, indices

ga_instance = pygad.GA(num_generations=20,
                       num_parents_mating=5,
                       sol_per_pop=10,
                       num_genes=len(function_inputs),
                       fitness_func=fitness_func,
                       on_parents=on_parents,
                       suppress_warnings=True)

ga_instance.run()

ga_instance.plot_fitness()

Overdrivr commented 7 months ago

Thanks @ahmedfgad , much clearer ! One last question, couldn't find this documented anywhere, in your example you're forcing all returned indices to zero. What's the impact of this ? Should I expect to need to return something else in a more realistic situation ?

ahmedfgad commented 7 months ago

The above code works well only if you set both of these parameters to 0:

keep_elitism=0
keep_parents=0

If at least one parameter is non-zero, then you are right. The code should not use zero as the index for all the new parents. Let me explain how this would be solved easily.

To be at the safe side, you should update the following attributes according to the new parents:

last_generation_parents_as_list: This is a list of the parents of the last generation. You should update this list according to the list of new parents. Its dtype must be list.
last_generation_parents_indices: The indices of the parents. The parents indices usually start from 0. So, if you have 4 parents then the indices start from 0 to 3.
last_generation_elitism_as_list: This is a list of the elitism (best solutions) of the last generation. You should update this list if any new parent is better than the current elitism. Its dtype must be list.
last_generation_elitism_indices: The indices of the elitism.
previous_generation_fitness: This is a list of fitness values that should be updated according to the fitness values of the new parents.

This way you will 100% sure that the new parents will have no impact on the algorithm.