dwdyer / watchmaker

The Watchmaker Framework for Evolutionary Computation
https://watchmaker.uncommons.org
Apache License 2.0
206 stars 79 forks source link

Performance: Skip duplicate candidates with IdentityHashMap and make AbstractEvolutionEngine thread safe #31

Open bernard01 opened 6 years ago

bernard01 commented 6 years ago

We can improve evolution performance of the AbstractEvolutionEngine by exploiting the specified SelectionStrategy behavior.

interface SelectionStrategy#select comments:

... the same individual may potentially be selected more than once

This has a consequence in AbstractEvolutionEngine#evaluatePopulation()

which will then evaluate identical candidates more than once.

The result is reduced evolution performance which is a general problem not specific to any application.

One gets partial relief by maintaining the fitness value in the candidate itself (not something that is always possible) and evaluating conditionally. That requires synchronizing on the candidate in FitnessEvaluator#getFitness(). However, this scheme runs into thread contention problems with a high number of identical candidates, again resulting in reduced evolution performance.

The permanent solution is to only evaluate a reduced size set of distinct candidates. Synchronizing as described above is then no longer required.

So in AbstractEvolutionEngine#evaluatePopulation()

// Each entry contains the count of skipped duplicates (in addition to the 1st).
Map<T, Integer> duplicatesCountMap = new IdentityHashMap();
for (T candidate: population){
    if(duplicatesCountMap.containsKey(candidate)){
        duplicatesCountMap.put(candidate, duplicatesCountMap.get(candidate) + 1);
    }else{
        duplicatesCountMap.put(candidate, 0);
    }
}

and then in both single threaded and multi-threaded if branches

replace for (T candidate : population) with for (T candidate : duplicatesCountMap.keySet())

and then later some post-processing to add the skipped duplicates back to the results so that the population size remains the same (that is why we count the duplicates):

final List<EvaluatedCandidate<T>> skippedPopulation = new ArrayList<EvaluatedCandidate<T>>(population.size());
for(EvaluatedCandidate<T> evaluatedCandidate : evaluatedPopulation){
    final Integer skippedCount = duplicatesCountMap.get(evaluatedCandidate.getCandidate());
    for(int index = 0; index < skippedCount; index++){
        skippedPopulation.add(evaluatedCandidate);
    }
}
evaluatedPopulation.addAll(skippedPopulation);

Please refer to the attached source file AbstractEvolutionEngine.zip

.

bernard01 commented 6 years ago

AbstractEvolutionEngine is not thread safe. That is because it is likely that multiple threads evaluate the same candidate at the same time.

We fix that and improve evolution performance of the AbstractEvolutionEngine at the same time by exploiting the specified SelectionStrategy behavior.

interface SelectionStrategy#select comments:

... the same individual may potentially be selected more than once

This has a consequence in AbstractEvolutionEngine#evaluatePopulation()

which will then evaluate identical candidates more than once.

The result is contention and reduced evolution performance which is a general problem not specific to any application.

One gets partial relief by maintaining the fitness value in the candidate itself (not something that is always possible) and evaluating conditionally. That requires synchronizing on the candidate in FitnessEvaluator#getFitness() because the engine is not thread safe. However, this scheme runs into thread contention problems with a high number of identical candidates, again resulting in reduced evolution performance.

The permanent solution is to only evaluate a reduced size set of distinct candidates. Synchronizing as described above is then no longer required.

So in AbstractEvolutionEngine#evaluatePopulation()

    // Each entry contains the count of skipped duplicates (in addition to the 1st).
    Map<T, Integer> duplicatesCountMap = new IdentityHashMap();
    for (T candidate: population){
        if(duplicatesCountMap.containsKey(candidate)){
            duplicatesCountMap.put(candidate, duplicatesCountMap.get(candidate) + 1);
        }else{
            duplicatesCountMap.put(candidate, 0);
        }
    }

and then in both single threaded and multi-threaded if branches

replace for (T candidate : population) with for (T candidate : duplicatesCountMap.keySet())

and then later some post-processing to add the skipped duplicates back to the results so that the population size remains the same (that is why we count the duplicates):


final List<EvaluatedCandidate<T>> skippedPopulation = new ArrayList<EvaluatedCandidate<T>>(population.size());
for(EvaluatedCandidate<T> evaluatedCandidate : evaluatedPopulation){
    final Integer skippedCount = duplicatesCountMap.get(evaluatedCandidate.getCandidate());
        for(int index = 0; index < skippedCount; index++){
            skippedPopulation.add(evaluatedCandidate);
        }
}
evaluatedPopulation.addAll(skippedPopulation);

Please refer to the attached source code. AbstractEvolutionEngine.zip

JoonasVali commented 3 years ago

Shouldn't really be a problem with CachingFitnessEvaluator, which caches the fitnesses.