Tracking information about evolving populations

esa / pagmo2

A C++ platform to perform parallel computations of optimisation tasks (global and local) via the asynchronous generalized island model.

https://esa.github.io/pagmo2/

GNU General Public License v3.0

829 stars 161 forks source link

Tracking information about evolving populations #154

Closed ckaldemeyer closed 6 years ago

ckaldemeyer commented 6 years ago

Hi again,

I have alread looked into the documentation but could not find anything concerning my question.

My goal is to obtain information about evolving population like in this example:

import pygmo as pg
import numpy as np

class my_udp:
    def fitness(self, x):
        f1 = 1-np.exp(-((x[0]-1/np.sqrt(1))**2+(x[1]-1/np.sqrt(2))**2))
        f2 = 1-np.exp(-((x[0]+1/np.sqrt(1))**2+(x[1]+1/np.sqrt(2))**2))
        return [f1, f2]

    def get_nobj(self):
        return 2

    def get_bounds(self):
        return ([-4]*2, [4]*2)

pro = pg.problem(my_udp())

pop = pg.population(pro, size=8)

algo = pg.algorithm(pg.nsga2(gen=10))

# "manually" tracking vectors/fitnesses of the evolving population
for i in range(10):
    pop = algo.evolve(pop)
    fits, vectors = pop.get_f(), pop.get_x()
    print('Generation ', i, ':', fits)

# is it possible to extract information about the evolving population
# without the "manual" solution subsequently?
#algo.set_verbosity(100)
pop = algo.evolve(pop)
fits_end, vectors_end = pop.get_f(), pop.get_x()
print('After 10 generations:', fits_end)

So it works already with my manual solution e.g. by saving the generational data in some data structure. But I am not sure if there is an easier alternative using some method e.g. on the population object.

What would be the easiest way to keep track of evolving populations?

And why does a direct call of pop = algo.evolve(pop) always deliver exactly n individuals as determined in the beginning? Isn't there also something like a "Hall of Fame" for good individuals from former generations so that the resulting pareto-front is bigger than n? Or is this a result of the parallel design of pagmo?

Cheers Cord

bluescarni commented 6 years ago

It depends what you mean exactly by tracking.

Some algorithms support some form of logging, meaning that you can have some information recorded during the evolution process and then later you can retrieve that information. Look for methods called get_log() and similar in the UDA (user-defined algorithms). Note that the log is not a general property of the pygmo.algorithm container, it is something that is specific to each and every UDA, so you will have to extract the UDA from the pygmo.algorithm if you want to access its log. Also note that each UDA does logging differently, so there's not a uniform way of accessing the logged information.

Otherwise, if you are running evolutions in an island, the "current" population of the island is always accessible (it gets replaced with the evolved population only at the end of the execution of the algorithm's evolve() method). So for instance you could periodically extract the current population of an island and extract from it the information you are interested in.

Finally, how the population is "evolved" is entirely dependent on the specific UDA. I believe at this time no algorithm discards or adds any individual from/to an input population, so the number of individuals will be a constant. For single-objective optimisation, there is the concept of a population "champion" (the best individual), but for multiobjective optimisation there's no champion defined and you will have to do the individual ranking on your own (perhaps using the ranking functions provided by pagmo). @darioizzo can tell you more precisely as he's the algorithmic expert.

ckaldemeyer commented 6 years ago

t depends what you mean exactly by tracking.

Some algorithms support some form of logging, meaning that you can have some information recorded during the evolution process and then later you can retrieve that information. Look for methods called get_log() and similar in the UDA (user-defined algorithms). Note that the log is not a general property of the pygmo.algorithm container, it is something that is specific to each and every UDA, so you will have to extract the UDA from the pygmo.algorithm if you want to access its log. Also note that each UDA does logging differently, so there's not a uniform way of accessing the logged information.

That (tracking=logging) was exactly what I was looking for. Thanks!

Otherwise, if you are running evolutions in an island, the "current" population of the island is always accessible (it gets replaced with the evolved population only at the end of the execution of the algorithm's evolve() method). So for instance you could periodically extract the current population of an island and extract from it the information you are interested in.

This would be realized like in my example above, right?

Finally, how the population is "evolved" is entirely dependent on the specific UDA. I believe at this time no algorithm discards or adds any individual from/to an input population, so the number of individuals will be a constant. For single-objective optimisation, there is the concept of a population "champion" (the best individual), but for multiobjective optimisation there's no champion defined and you will have to do the individual ranking on your own (perhaps using the ranking functions provided by pagmo). @darioizzo can tell you more precisely as he's the algorithmic expert.

Thanks for your clarification!

bluescarni commented 6 years ago

This would be realized like in my example above, right?

That would be a possibility. The other thing you could do is to analyse at regular intervals of time the population. Since the island evolution is asynchronous, you can just periodically check what happens to the contained population (keeping in mind that the population is updated all at once every time the algorithm returns, so this makes sense only if you are calling the island's evolve method with n > 1).

The other thing I remembered, is that I am not 100% sure what happens to the algorithm's log when running in the island. I think there's a possibility that the log gets lost if using a multiprocessing or clustering island, but I am not 100% sure. Please open a report if you figure that out, we would consider it a defect to rectify.

VittorioTrotta commented 6 years ago

Hello,

Taking a similar case from the first post, i wanted to know if there is any big difference between the (1) first aprroach and the (2) sencond aprroach in terms of the final result:

import pygmo as pg
import numpy as np
import pandas as pd
from pygmo import *

class sphere_function:
     def __init__(self, dim):
         self.dim = dim

     def fitness(self, x):
         return [sum(x*x)]

     def get_bounds(self):
         return ([-1] * self.dim, [1] * self.dim)

     def get_name(self):
         return "Sphere Function"

     def get_extra_info(self):
         return "\tDimensions: " + str(self.dim)

pro = pg.problem(sphere_function(3))

# First approach - setting number of generations to one and looping n times
pop1 = pg.population(pro, size=8)
algo1 = pg.algorithm(pg.bee_colony(gen=1))

fits1=[]
for i in range(10):
    pop1 = algo1.evolve(pop1)
    fits1.append(pop1.champion_f)
print(fits1)

# Second approach - setting number of generations n 
pop2 = pg.population(pro, size=8)
algo2 = pg.algorithm(pg.bee_colony(gen=10))
algo2.set_verbosity(1)
pop2 = algo2.evolve(pop2)
fits2=pd.DataFrame(algo2.extract(bee_colony).get_log())
#get only the best result from each generation
print( fits2[2])

In the first approach it is easy too keep track of the evolving population and evaluation function, but in the second i can not find a way to keep track of the population.

Cheers

Vittorio

darioizzo commented 6 years ago

In the second case you can access the log. Maybe check the tutorials (https://esa.github.io/pagmo2/docs/python/tutorials/solving_schwefel_20.html) where this is shown in code.

As for the optimization flow, the two cases as you use bee colony should be equivalent as bee_colony has no memory of previous calls to evolve.

ckaldemeyer commented 6 years ago

This would be realized like in my example above, right?

That would be a possibility. The other thing you could do is to analyse at regular intervals of time the population. Since the island evolution is asynchronous, you can just periodically check what happens to the contained population (keeping in mind that the population is updated all at once every time the algorithm returns, so this makes sense only if you are calling the island's evolve method with n > 1).

The other thing I remembered, is that I am not 100% sure what happens to the algorithm's log when running in the island. I think there's a possibility that the log gets lost if using a multiprocessing or clustering island, but I am not 100% sure. Please open a report if you figure that out, we would consider it a defect to rectify.

Your answer somehow got lost in my mail flood. Thanks a lot! I'll do so!

VittorioTrotta commented 6 years ago

Thanks for the fast response @darioizzo !

In the second case you can access the log. Maybe check the tutorials (https://esa.github.io/pagmo2/docs/python/tutorials/solving_schwefel_20.html) where this is shown in code.

I checked the tutorial and the code, and have found out how to get the log of the champion evaluation of each generation, the problem is that the code does not return the decision vector champion for each generation.

Is there a way around to get this data?

bluescarni commented 6 years ago

@VittorioTrotta

I have added to pygmo a decorator meta-problem, that allows you (amongst other things) to modify a problem so that it logs the decision vectors that have been passed to it. See for instance the example in the documentation:

https://esa.github.io/pagmo2/docs/python/tutorials/udp_meta_decorator.html#logging-fitness-evaluations

This should allow you to log the information you need.

I am closing the report for now, please open a new one if you find issues with the decorator problem approach.