AureumChaos / LEAP

A general purpose Library for Evolutionary Algorithms in Python.
Academic Free License v3.0
84 stars 19 forks source link

Grouped Coevolution #168

Closed SigmaX closed 1 year ago

SigmaX commented 3 years ago

The grouped evaluation mechanism implemented in #123 works great, but coevolution (CooperativeEvaluate) doesn't make use of it. I need coevolution + grouped evaluation for an application involving populations that are evaluated in parallel on a GPU.

Tweak it so it does, or rather, can when requested.

SigmaX commented 3 years ago

A couple ways I could go here.

  1. Add a CooperativeEvaluate.grouped() function, that works like CooperativeEvaluate.__call__() but on chunks instead of single individuals.

  2. Refactor coevolution to use a special Problem wrapper, instead of its own evaluation operator. This way grouped evaluation would work the same as anywhere else—by choosing the standard evaluation operator. Perhaps the cleaner option?

SigmaX commented 2 years ago

To summarize:

We can't really do grouped evaluation with an iteriter_op. So we definitely need a new operator here.

Idea: how about a straightforward listlist_op version of CooperativeEvaluate? It would be natural to have it call grouped_evaluate() as a subroutine, allowing grouped evaluation logic to be enabled by custom Problem implementations. This would be approach (1) mentioned in my previous comment.


My alternative idea (2) was a Problem wrapper—say, CooperativeProblem. To make Problem responsible for coevolutionary logic, we would need to give it an interface such that you can hand it a partial individual, tell it which subpopulation that partial solution belongs to, and also give it access to the current population as a whole so it could go and find collaborators to construct full solutions.

I think the way to do this, while respecting the Problem interface (which takes just a phenome as input to its evaluate() method, no other arguments) is to tell the CooperativeProblem at construction time which subpopulation it will receive partial solutions from.

This suggests an arguably elegant (or at least intuitive) view of cooperative coevolution: we will have several subpopulations, each of which will have its own fitness function (a CooperativeProblem instance), configured specifically for that subproblem. It just happens that these fitness functions are a function of other populations.

Otherwise, it behaves much like, say, a hetereogeneous island model (sans migration).

To my surprise, I actually like idea (2). To implement it... let's see...

Circling back: besides an elegant view of coevolution as "multiple sub-populations with their own (interdependent) fitness functions," what does this buy us?


tl;dr:

(1) is definitely simpler and meets my immediate need.

(2) is not that complicated, and has an arguable elegance about it. Hmm.

SigmaX commented 2 years ago

Complication:

I started implementing (2). It's mostly a straightforward refactor, converting our existing CooperativeEvaluate operator into a new CooperativeProblem class that contains the same logic.

But a Problem takes a phenome as input. In coevolution, typically we want to combine genomes. (This has me realizing that one might want to do either one: combine at the genotypic level, or at the phenotypic level.)

The problem is that, if we only support one, genotypic recombinations are most important and standard. But I'm not sure this is possible with our Problem, since it doesn't take a genome.

Options:

SigmaX commented 2 years ago

Picking this back up after a detour in #191.

Third way followed: #191 refactors the Problem interface to take an Individual instead of just its phenome. This gives me the flexibility to implement a coevolutionary Problem that an combine individuals however I want.

SigmaX commented 2 years ago

Implementation complete and tests/example are passing.

I just want to make sure the resulting algorithm behaves the same as the old one before merging and closing this issue.

SigmaX commented 2 years ago

Collecting some data for a regression test:

for i in $(seq 0 99); do
    echo ${i};
    python ../examples/advanced/coevolution_via_fitness_functions.py > coevolution_via_problem_run${i}.csv;
done

And the old version:

for i in $(seq 0 99); do
    echo ${i};
    python ../examples/advanced/coevolution.py > coevolution_via_operator_run${i}.csv;
done
SigmaX commented 2 years ago

Interestingly, the Problem-based implementation appears to run much faster than the CooperativeEvaluate operator implementation. I'm not sure why that is.

SigmaX commented 2 years ago

Behavior checks out: the new coevolution behaves like the old one in term of mean fitness in each subpopulation:

image

Script I used to analyze the data:

%%bash
mkdir -p preprocessed/
for f in *.csv; do
   cat ${f} \
       | sed -E 's/\[|\]//g' \
       | sed 's/subpop_bsf/subpop_0, subpop_1, subpop_2, subpop_3/g' \
       > preprocessed/${f}
done
from glob import glob
import re

from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns

plt.style.use('ggplot')

##### Load the data
def get_runs(version: str):
    """Load all of the files for our single-task runs into a single dataframe."""

    def load_file(f):
        """Load a single file into a dataframe."""
        df = pd.read_csv(f, skipinitialspace=True, comment='#')

        # Tet the job id from the file name
        job_finds = re.findall('_run([0-9]*).csv', f)
        assert(len(job_finds) == 1)
        job = job_finds[0]
        df['job'] = job

        # Correct the paradigm column (since we gave it the wrong value in the experiment)
        df['version'] = version

        return df

    #One file per *run* (containing all tasks)
    pattern = f"preprocessed/coevolution_via_{version}_run*.csv"
    files = glob(pattern)
    assert(len(files) > 0), f"No files found for pattern '{pattern}'."

    dfs = [ load_file(f) for f in files ]
    df = pd.concat(dfs)

    #assert(len(df) == 100*2001), f"Got {len(df)} rows total, but expected {100*2000}."
    #assert(len(df.job.unique()) == 100)
    assert(len(df.generation.unique()) == 2001), f"Expected {2001} different generations, but got {len(df.generation.unique())}: {df.generation.unique()}."

    return df.reset_index(drop=True)

# Example
#df = get_runs('problem')
#df

# Wide to long
df = pd.concat([get_runs('problem'), get_runs('operator')]).reset_index()
df = pd.melt(df, id_vars=['job', 'generation', 'version'], value_vars=['subpop_0', 'subpop_1', 'subpop_2', 'subpop_3',])
# df

# Plot
plt.figure(figsize=(12, 8))
sns.lineplot(data=df[df.generation < 50],
             x='generation',
             y='value',
             hue='version',
             style='variable')
#plt.ylim(10, 20)
plt.yscale('log')
SigmaX commented 2 years ago

Reopening because Kexin encountered two issues:

  1. The CooperativeProblem class seems to inherit the evaluate_multiple function from the Problem class, which evaluates a group of individuals sequentially, rather than in parallel.

  2. When I tried to add a log stream to the “coevolution_via_fitness_functions.py” example, I encountered the error below. It seems like the _log_trial function is expecting all_collaborators to be a list of Individual objects, but the function _choose_collaborators returns a list of genomes, which causes a mismatch.

  File "/home/kexin/LEAP/leap_ec/problem.py", line 472, in evaluate
    self._log_trial(
  File "/home/kexin/LEAP/leap_ec/problem.py", line 516, in _log_trial
    'genome'                    : collab.genome,
AttributeError: 'numpy.ndarray' object has no attribute 'genome'
SigmaX commented 2 years ago

Looking at (1): should be easy to fix. In Kexin's application, CooperativeProblem.wrapped_problem is an instance of ExternalProcessProblem (which interfaces with CARLsim). I just need to write a CooperativeProblem.evaluate_multiple() function that collects combined phenomes for all individuals in a subpopulation at once (using the same logic as CooperativeProblem.evaluate().

SigmaX commented 2 years ago

@kexinchenn identified another bug: it seems that individuals are not correctly being assigned fitness values.

In both examples/advances/coevolution.py and examples/advances/coevolution_via_fitness_functions.py, if I instrument the ops.random_selection operator to print out the fitnesses of collaborators at the moment that they are selected, they all have the initial arbitrary fitness value of -100:

    Chose individual [1 0 0] -100, fitness: -100
    Chose individual [1 0 1 1] -100, fitness: -100
    Chose individual [0 0 0 1 0] -100, fitness: -100
    Chose individual [1 1 0] -100, fitness: -100
    Chose individual [1 1 1 1] -100, fitness: -100
    Chose individual [0 0 0 0 1] -100, fitness: -100
    Chose individual [0 1 0] -100, fitness: -100
    Chose individual [1 0 1 1] -100, fitness: -100
    Chose individual [0 1 0 1 0] -100, fitness: -100
    Chose individual [1 1 1] -100, fitness: -100
    Chose individual [1 0 1 0] -100, fitness: -100
    Chose individual [0 0 1 1 1] -100, fitness: -100
    Chose individual [1 0 0] -100, fitness: -100
    Chose individual [0 0 1 0] -100, fitness: -100
    Chose individual [1 1 1 1 1] -100, fitness: -100
    Chose individual [1 0 0] -100, fitness: -100
    Chose individual [0 1 0 0] -100, fitness: -100
    Chose individual [0 1 0 1 0] -100, fitness: -100
    Chose individual [0 1 0] -100, fitness: -100
    Chose individual [1 1 0 0] -100, fitness: -100
    Chose individual [1 0 1 1 0] -100, fitness: -100
    Chose individual [1 1 1] -100, fitness: -100
    Chose individual [0 0 1 1] -100, fitness: -100
    Chose individual [0 1 0 1 0] -100, fitness: -100
14, [13, 13.666666666666666, 15, 14.333333333333334]

The last line is the generate boundary, and we do see normal fitness values there.

This suggests that perhaps fitnesses for combined individuals are being calculated correctly, but fitnesses for partial solutions within each subpopulation are not being assigned...

SigmaX commented 2 years ago

Debugging.

I'm seeing fitness values in the subpopulation updated correctly at the end of each generation (governed by line 308 in the following, which is in the main loop of multi_population_ea()):

image

But when we drill down into CooperativeEvaluate, at the moment where it looks at context to grab a reference to the subpopulations, the fitness values are all -100 again:

image

So it seems that something is happening that resets the fitnesses (or the references to the subpops?) in between the generation boundary and when we run the CoevolutionaryEvaluate operator...

SigmaX commented 2 years ago

So, still looking at coevolution.py, the pipeline is

# Operator pipeline
shared_pipeline=[
   ops.tournament_selection,
   ops.clone,
   mutate_bitflip(expected_num_mutations=1),
   ops.CooperativeEvaluate(
       num_trials=3,
       collaborator_selector=ops.random_selection,
       log_stream=log_stream),
   ops.pool(size=pop_size)
]

The only smoke I can find is that clone() always resets fitness. But it sets things to None, not -100, and this effects only the subpopulation currently being processed (which is fine!).

It seems as if the collaborator selection operators in CooperativeEvaluate is being bound to the original, initial population instead of the updated population from context. I can't see yet where that initial population could be being copied and kept, however.

SigmaX commented 2 years ago

Debugging crumb:

The population found in context has all -100 fitnesses even at the time the tournament_selection operator is executed.

SigmaX commented 2 years ago

Found it. Stupid bug in multi_population_ea. We have a pops variable that is supposed to point to the same thing as context, but when evaluating the initial population we overwrite the reference—from that point on, the two references point to different lists.

image
SigmaX commented 2 years ago

Fix is simple (just reorder the lines of code).

Writing a unit test to protect against regression will take some thought.