Evaluate individual doesn't match individual.fitness

dts333 commented 6 years ago

Hi, I'm fairly new with this so apologies in advance.

So I ran

pop, log = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.25, ngen=20, stats=mstats, halloffame=hof, verbose=True)

with natural fitness, and the log showed fitness maxing out at 7.90 in gen 13, before dropping back to 6.37. Afterwards, I ran evaluate(hof[0], TrainingData) and got 0.45. I got the same results from toolbox.evaluate(hof[0]) and

fitnesses = toolbox.map(toolbox.evaluate, hof)
fitnesses.__next__()

However, hof[0].fitness.values still reads 6.37. I've tried this multiple times, so I don't think that the issue is any of the pandas code in my evaluate function altering data frames. I also added cloning to each of my mutators in case some of them were altering the individuals after fitness had been registered, but since cloning is already in varAnd, as you might expect, all this did was make it run slower. I've been looking for what's causing this all day and I'm running out of ideas, so any help would be appreciated. This is a genetic programming project in case that's relevant

fmder commented 6 years ago

Couple of things to check first.

Are you sure you are doing maximization? creator.create("FitnessMax", base.Fitness, weights=(1.0,))
Is your evaluation determinist?

dts333 commented 6 years ago

Check and check. I named "FitnessMax" something else, but I assume that doesn't matter. The evaluation is deterministic: toolbox.evaluate(hof[0]) gives 0.45 no matter how many times I run it

cmd-ntrf commented 6 years ago

What are you primitives?

dts333 commented 6 years ago

def protectedDiv(numerator, denominator):
    try:
        return numerator / denominator
    except ZeroDivisionError:
        return 1

pset = gp.PrimitiveSet("Main", arity=5)
pset.addPrimitive(operator.add, 2)
pset.addPrimitive(operator.sub, 2)
pset.addPrimitive(operator.mul, 2)
pset.addPrimitive(protectedDiv, 2)

cmd-ntrf commented 6 years ago

Since everything appears to be standard so far, I am affraid that you will have to either share your code or submit a toy version of it that presents the same issue. Otherwise, we currently do not have enough information to correctly pintpoint the source of the problem.

fmder commented 6 years ago

Can you provide a simplified version of your evaluation function?

dts333 commented 6 years ago

Here's the evaluation function. Each individual has five genes, each of which contain one tree. It's supposed to look at five data points each over ten timepoints for a number of data sets, and then predict which data set will increase by the most on the next timepoint

def evaluate(individual, data):
    func0 = toolbox.compile(expr=gp.PrimitiveTree(individual[0][0]))
    func1 = toolbox.compile(expr=gp.PrimitiveTree(individual[1][0]))
    func2 = toolbox.compile(expr=gp.PrimitiveTree(individual[2][0]))
    func3 = toolbox.compile(expr=gp.PrimitiveTree(individual[3][0]))
    func4 = toolbox.compile(expr=gp.PrimitiveTree(individual[4][0]))
    funcs = [func0, func1, func2, func3, func4]

    def eval(x):
        score = 0
        for i in range(5):
            gene = individual[i]
            score += funcs[i](*(x.iat[5 * gene[j + 1][0] + gene[j + 1][1]] for j in range(5)))
        return score

    scores = pandas.DataFrame(data.apply(eval, axis=1))
    scores.columns = ['scores']
    top_scores = scores.groupby('timestamp')['scores'].transform(lambda x: x == x.max())

    fitness = TrainingResultData.loc(axis=0)[top_scores.values]
    fitness = fitness.product()
    return (fitness,)

chaltik commented 5 years ago

I am running into a very similar issue: having run the optimization using pop, logbook = algorithms.eaMuPlusLambda(...) I am finding notable discrepancy between the values of the fitness calculated using toobox.map and toolbox.evaluate manually iterated over the population: pop_fitnesses_tbmap = np.array([pf[0] for pf in toolbox.map(toolbox.evaluate, pop)]) pop_fitnesses_tbeval = np.array([toolbox.evaluate(ind)[0] for ind in pop]) pop_fitnesses_direct = np.array([direct_eval(ind) for ind in pop]) The first line gives numbers that are nothing like the second and the third, which are identical, as expected (direct_eval function is "registered" via 'toolbox.register("evaluate",direct_eval)): print(np.sum(np.abs(pop_fitnesses_tbmap-pop_fitnesses_tbeval))) 15604327.392578125

print(np.sum(np.abs(pop_fitnesses_direct-pop_fitnesses_tbeval))) 0.0`

DEAP / deap

Evaluate individual doesn't match individual.fitness #293