DEAP / deap

Distributed Evolutionary Algorithms in Python
http://deap.readthedocs.org/
GNU Lesser General Public License v3.0
5.8k stars 1.12k forks source link

Evaluate individual doesn't match individual.fitness #293

Open dts333 opened 6 years ago

dts333 commented 6 years ago

Hi, I'm fairly new with this so apologies in advance.

So I ran

pop, log = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.25, ngen=20, stats=mstats, halloffame=hof, verbose=True)

with natural fitness, and the log showed fitness maxing out at 7.90 in gen 13, before dropping back to 6.37. Afterwards, I ran evaluate(hof[0], TrainingData) and got 0.45. I got the same results from toolbox.evaluate(hof[0]) and

fitnesses = toolbox.map(toolbox.evaluate, hof)
fitnesses.__next__()

However, hof[0].fitness.values still reads 6.37. I've tried this multiple times, so I don't think that the issue is any of the pandas code in my evaluate function altering data frames. I also added cloning to each of my mutators in case some of them were altering the individuals after fitness had been registered, but since cloning is already in varAnd, as you might expect, all this did was make it run slower. I've been looking for what's causing this all day and I'm running out of ideas, so any help would be appreciated. This is a genetic programming project in case that's relevant

fmder commented 6 years ago

Couple of things to check first.

dts333 commented 6 years ago

Check and check. I named "FitnessMax" something else, but I assume that doesn't matter. The evaluation is deterministic: toolbox.evaluate(hof[0]) gives 0.45 no matter how many times I run it

cmd-ntrf commented 6 years ago

What are you primitives?

dts333 commented 6 years ago
def protectedDiv(numerator, denominator):
    try:
        return numerator / denominator
    except ZeroDivisionError:
        return 1

pset = gp.PrimitiveSet("Main", arity=5)
pset.addPrimitive(operator.add, 2)
pset.addPrimitive(operator.sub, 2)
pset.addPrimitive(operator.mul, 2)
pset.addPrimitive(protectedDiv, 2)
cmd-ntrf commented 6 years ago

Since everything appears to be standard so far, I am affraid that you will have to either share your code or submit a toy version of it that presents the same issue. Otherwise, we currently do not have enough information to correctly pintpoint the source of the problem.

fmder commented 6 years ago

Can you provide a simplified version of your evaluation function?

dts333 commented 6 years ago

Here's the evaluation function. Each individual has five genes, each of which contain one tree. It's supposed to look at five data points each over ten timepoints for a number of data sets, and then predict which data set will increase by the most on the next timepoint

def evaluate(individual, data):
    func0 = toolbox.compile(expr=gp.PrimitiveTree(individual[0][0]))
    func1 = toolbox.compile(expr=gp.PrimitiveTree(individual[1][0]))
    func2 = toolbox.compile(expr=gp.PrimitiveTree(individual[2][0]))
    func3 = toolbox.compile(expr=gp.PrimitiveTree(individual[3][0]))
    func4 = toolbox.compile(expr=gp.PrimitiveTree(individual[4][0]))
    funcs = [func0, func1, func2, func3, func4]

    def eval(x):
        score = 0
        for i in range(5):
            gene = individual[i]
            score += funcs[i](*(x.iat[5 * gene[j + 1][0] + gene[j + 1][1]] for j in range(5)))
        return score

    scores = pandas.DataFrame(data.apply(eval, axis=1))
    scores.columns = ['scores']
    top_scores = scores.groupby('timestamp')['scores'].transform(lambda x: x == x.max())

    fitness = TrainingResultData.loc(axis=0)[top_scores.values]
    fitness = fitness.product()
    return (fitness,)
chaltik commented 5 years ago

I am running into a very similar issue: having run the optimization using pop, logbook = algorithms.eaMuPlusLambda(...) I am finding notable discrepancy between the values of the fitness calculated using toobox.map and toolbox.evaluate manually iterated over the population: pop_fitnesses_tbmap = np.array([pf[0] for pf in toolbox.map(toolbox.evaluate, pop)]) pop_fitnesses_tbeval = np.array([toolbox.evaluate(ind)[0] for ind in pop]) pop_fitnesses_direct = np.array([direct_eval(ind) for ind in pop]) The first line gives numbers that are nothing like the second and the third, which are identical, as expected (direct_eval function is "registered" via 'toolbox.register("evaluate",direct_eval)): print(np.sum(np.abs(pop_fitnesses_tbmap-pop_fitnesses_tbeval))) 15604327.392578125

print(np.sum(np.abs(pop_fitnesses_direct-pop_fitnesses_tbeval))) 0.0`