Formulas and Results doesn't match working with pd.Series

arthurire commented 3 years ago

I re-opened this issue cuz I finally found what happened : the original pset was modified during the loop. Everything works fine if I re-generate the pset before updating fitness.

I hope to know if there's anywhere that could change the pset, especially when generating populations.

================================================ Formulas and Results doesn't match

i.e. formula shows add(a,b) and it turns out to be sub(a,b) when looking at the result of toolbox.compile(ind)

creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMax)

pset = gp.PrimitiveSetTyped("main", [], pd.DataFrame)

#followed by all Primitives and Terminals

def evalSymbReg(ind):
    global pset
    func = toolbox.compile(ind,pset=pset)
    return func.mean()

toolbox = base.Toolbox()
toolbox.register("expr", gp.genGrow, pset=pset, min_=1, max_=10)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)
toolbox.register("evaluate", evalSymbReg)
toolbox.register("select", tools.selNSGA2)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genGrow, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)
toolbox.decorate(
    "mate", gp.staticLimit(key=operator.attrgetter("height"), max_value=10)
)
toolbox.decorate(
    "mutate", gp.staticLimit(key=operator.attrgetter("height"), max_value=10)
)
hof = tools.HallOfFame(10)
def main():
    random.seed(36)
    pop = toolbox.population(n=10000)
    stats = tools.Statistics(key=lambda ind: ind.fitness.values)
    stats.register("avg", np.mean)
    stats.register("min", np.min)
    stats.register("max", np.max)
    pop, logbook = algorithms.eaSimple(
        pop,
        toolbox,
        cxpb=0.5,
        mutpb=0.2,
        ngen=5,
        stats=stats,
        verbose=True,
        halloffame=hof,
    )
    return pop, logbook

if __name__ == "__main__":
    pool = multiprocessing.Pool(processes=20)
    toolbox.register("map", pool.map)
    pop_result,logbook = main()
    pool.close()

Here are the returns:

0 10000 -0.745048 -10 1.72108 1 6036 -0.774616 -10 2.20018 2 5987 -0.760649 -10 1.86958 3 6021 -0.758109 -10 1.99002 4 5979 -0.74858 -10 1.99002

Here are the fitness.values in HOF

(-0.5168971247476096,) (-0.48964720411362084,) (-0.3588201328636351,) (1.6689032764601874,) (1.741588916931653,) (-0.6276309546449512,) (1.6911644461920026,) (1.5887612074710653,) (1.5507840221851648,) (1.4511766174393808,)

arthurire commented 3 years ago

should be problems with np.array , closed.

fmder commented 3 years ago

Is this open or closed?

arthurire commented 3 years ago

Is this open or closed?

Still opening.

arthurire commented 3 years ago

Is this open or closed?

Sorry but now I think I could close my issue. There's something wrong with my own function. I thought it should pass a view() of numpy.array but it passed the array itself.

DEAP / deap

Formulas and Results doesn't match working with pd.Series #525