DEAP / deap

Distributed Evolutionary Algorithms in Python
http://deap.readthedocs.org/
GNU Lesser General Public License v3.0
5.66k stars 1.11k forks source link

Multiprocessor evolutionary algorithm consistently finds worse solution than single threaded #439

Open Dadle opened 4 years ago

Dadle commented 4 years ago

Not sure if this is a feature or a bug, but would love some input either way. Whenever I replace map to run DEAP parallellized as in code at the bottom I consistently get worse fitness development than with a single threaded version of the same code (just commenting out the dask_map line). These results have been replicated with both threading using multiprocessing.dummy and Dask over 5 runs each on 2 optimization problems.

What causes this consistent difference of 400 fitness between a single threaded and parallellized version of the algorithm?

Notice below fitness is minimized

Example fitness single threaded: image

Example fitness Dask multiprocessing with 2 CPUs: image

Setup of evolution:

ea.creator.create("FitnessMin", ea.base.Fitness, weights=(-1.0,))
        ea.creator.create("Individual", list, fitness=ea.creator.FitnessMin)
self.toolbox = ea.base.Toolbox()

        self.toolbox.register("attribute", random.randint, 0, self.MAX_NUMBER_OF_CLUSTERS - 1)
        self.toolbox.register("map", self.dask_map)
        self.toolbox.register("individual", tools.initRepeat, creator.Individual,
                              self.toolbox.attribute, n=self.genome_len)
        self.toolbox.register("population", tools.initRepeat, list,
                              self.toolbox.individual)

        self.toolbox.register("evaluate", self.eval_dsm_min)
        self.toolbox.register("mate", tools.cxTwoPoint)
        self.toolbox.register("mutate", tools.mutUniformInt, indpb=survey.param_indpb, low=0,
                              up=self.MAX_NUMBER_OF_CLUSTERS - 1)
        # Use elitism + normal ranking
        self.toolbox.register("select", tools.selTournament, tournsize=survey.param_tournsize)

Dask map function used:

    def dask_map(self, func, iterable):
        bag = db.from_sequence(iterable).map(func)
        return bag.compute()
DMTSource commented 4 years ago

This may get more responses in the discussion thread vs the bug reporting system: https://groups.google.com/forum/#!forum/deap-users

Can you run a tests to confirm that an evaluation is determinate between the two cases? Map to the cluster the same individual for eval and see what happens essentially, are the results all identical? I would guess off the top of my head that the global scope or something is not being setup in the workers the same way that the 'main' process had its scope constructed.