DEAP / deap

Distributed Evolutionary Algorithms in Python
http://deap.readthedocs.org/
GNU Lesser General Public License v3.0
5.75k stars 1.12k forks source link

Non-optimal solution set returned #199

Closed pranavdheram closed 7 years ago

pranavdheram commented 7 years ago

Hello,

I am trying to generate an efficient frontier which can capture the portfolios with the best risk-return trade-off. suffices to say, I am trying to solve a MOOA - minimizing one function and maximizing the other. The results however, are extremely weird.

The problem itself is pretty straight-forward. I use a code very similar to the one shown in the documentation. These are the settings:

toolbox.register("mate",tools.cxSimulatedBinaryBounded, low=BOUND_LOW, up=BOUND_UP,eta=20.0) toolbox.register("mutate", tools.mutPolynomialBounded, low=BOUND_LOW, up=BOUND_UP, eta=20.0, indpb=1.0/N_Stocks) toolbox.register("select", tools.selNSGA2) toolbox.register("evaluate", evaluate_individual)

pop = toolbox.population(100)

Now it is like any other GA. Evaluate -> Select -> mate -> mutate -> repeat.

The final results however are very weird. I was wondering if there was a problem with my 'evaluate function' but look how the algorithm progresses over generations. Once it finds a solution on the optimal frontier, is it not supposed to assign a very high fitness and ensure that this point is carried to every other generation? This clearly doesn't happen. It is losing points.

ga500 ga1500 ga3000 ga_result

These images are in chronological order - 500 generations, 1500 generations, 3000 generations and 10000 generations. PS: Only consider the yellow dots. The blue ones are 1000 random portfolios and I am trying to find the best ones.

Isn't this weird? Should I share my code too?

osm3000 commented 7 years ago

I don't know how it is done in your Algorithm (NSGA-II, CMA-ES, ...etc), but unless the algorithm specifically use this kind of elitism (keeping the good point), it will not keep the good points for later. It is a good practice in evolutionary computation to keep record of different generations, and the good solutions. Sometimes, the evolutionary operators lead the solution to be lose diversity after some point, leading to the collapse (close points) you see in the last graph. You should consider playing with the hyperparameters of the algorithm, and see how it performs

fmder commented 7 years ago

Up until generation 3000 the result seems normal. It is the 10000 that appears to be strange. However, they are still pareto optimal. The only weird thing is that they don't spread as would do NSGA-2 in the general case. 7000 generations is a big step try pinpoint at what point you loose diversity on your front.