AureumChaos / LEAP

A general purpose Library for Evolutionary Algorithms in Python.
Academic Free License v3.0
83 stars 19 forks source link

Quality Diversity Problems #279

Open lukepmccombs opened 1 year ago

lukepmccombs commented 1 year ago

Quality Diversity algorithms have an additional characteristic recorded when evaluating individuals, their behavior descriptor. Depending on the algorithm, this can be translated into fitness in a variety of ways, and the process is almost always stateful. The current Problem cannot introduce new member variables to the Individual without subclassing Individual and altering the evaluate interface. Altering the Problem to translate behavior into fitness at evaluate time is not an option since in distributed evaluation each problem would be accessing different states.

An option is to store the quality and behavior descriptors with fitness as returned by .evaluate(), then decompose and convert to fitness using an operator, but this isn't super appealing to me.

A pie in the sky ask would be to alter the base Problem and Individual evaluate interfaces to support QD throughout. My idea is something along this line, barring more intuitive name choices:

Problem: A function, evaluate_fitness that calculates the traditional fitness of the individual and defaults to returning None. A function, evaluate_behavior that calculates the behavior descriptor of the individual and defaults to returning None. The original evaluate, defaulting to call evaluate_fitness and evaluate_behavior and return them as a tuple.

Individual: evaluate changed to store the respective return values to fitness and behavior_descriptor.

Then operators to join or replace fitness with some function of behavior_descriptor.

This would let the user implement any combination of behavior and fitness. If they want a normal optimization algorithm, they can simply implement evaluate_fitness. If they want to determine behavior separately, or are just doing a diversity based approach, implement evaluate_behavior. If the two have to be determined jointly, they can override evaluate and ignore the other two functions. Migrating existing code would just require renaming evaluate on existing Problem implementations to evaluate_fitness.

lukepmccombs commented 1 year ago

Another associated issue is how do we dictate maximize for the Problem after changing fitness according to the function of behavior? Its common practice to use the diversity value to turn single objective problems into multi objective, which brings along more issues if the underlying Problem was single objective.

A very invasive solution is to merge MultiObjectiveProblem and ScalarProblem using the principle that dominance degenerates to plain greater than in the single objective case (this is a core principle of UNSGA-III). Done right this shouldn't cludge anything, but the issue still remains of how to gracefully override maximize.

SigmaX commented 1 year ago

So, first of all love this (I have a soft spot in my heart for QD algorithms).

When we tried (and then reverted) an overhaul of the Problem.evaluate() interface last year (see #193), we settled on the idea of relying on **kwargs when fitness evaluation requires information beyond just the phenome that comes out of the decoder.

Is your point here that using that approach requires subclassing Individual, which can pose a problem for other reasons (perhaps related to #278)?

My own intuition was to avoid the idea of subclasses altogether wherever possible in leap (as I tend to strongly favor composition in place of inheritance). @markcoletti has made a good case for the usefulness of Individual subclasses though...

lukepmccombs commented 1 year ago

Composition, or building alternates to Individual and Problem, is an option. Although, this would require re-implementing everything that makes comparators and evaluation painless.

The issue with using the existing .evaluate() is that when doing distributed evaluation all parameters are copied without being returned or updating the original. For QD algorithms that translate behavior into fitness statefully, like Novelty Search and Surprise Search, this prevents them from learning new behaviors. Dask has some methods to pass stateful objects, but then a QD implementation would have to have separate code for single / multi threaded.

Also, yes subclassing Individual would have a stopping point here since you'd need that shared object.

lukepmccombs commented 1 year ago

Another option is to support dictionary return values for .evaluate(). The default case for non-dictionary return values could be the existing assign to fitness, while in other cases keys either get added to the namespace of the Individual or the dictionary kept as a member variable (in namespace form or original). This is about as adaptive a solution as you can get. It could also go the gym route and make the return value a tuple of fitness, info for the user to pass information out.

Edit: For reference of a similar interface, the Deep RL library Ray Train uses a similar method to control multiple actors.

lukepmccombs commented 1 year ago

This last option is growing on me. It opens opportunities for adaptive fitness by separating where the evaluation is stored from the fitness property. For instance, rather than returning a fitness at evaluation, the user could return desirable characteristics and subclass Individual to override the .fitness property with some transformation thereof. In the case of QD like Novelty Search, this could be providing the Individual with an NS object and querying the archive at the time of fitness being accessed.

SigmaX commented 1 year ago

@markcoletti and I were chatting about this—what about treating fitness & novelty/diversity evaluation as entirely separate processes? So, the usual evaluation operator/Problem approach would evaluate fitness/quality, but a separate operator would be used to evaluate novelty/diversity.

Then some third operator (?) would synthesize the two in an algorithm-specific way.

The question, I suppose, is whether an approach like this would be flexible enough to model all the major QD algorithms—ex. novelty search with LS, surprise search, MAP-elites, etc. (incidentally I have always viewed MAP-elites as a form of island model, but that's a discussion for another time).

lukepmccombs commented 1 year ago

I don't believe that approach is flexible enough as you surmised. The trouble is if the QD algorithm uses a behavior descriptor that is drawn from the observed states, or has to observe them itself like surprise search, then quality and diversity, or at the very least the values used to determine them, have to be evaluated simultaneously.

On a local branch I've given a trial run at replacing Individual.fitness with a new variable Individual.evaluation and the following property:

    @property
    def fitness(self):
        if hasattr(self.evaluation, "__getitem__") and "fitness" in self.evaluation:
            return self.evaluation["fitness"]
        elif hasattr(self.evaluation, "fitness"):
            return self.evaluation.fitness
        return self.evaluation

This lets you return a dictionary containing fitness, a class with the fitness property, or raw fitnesses, which I think would cover quite a few base cases. Obviously some edge cases, but this was just to demonstrate how flexible we could make it. If we just wanted a super simple base case, it could easily be made an alias to .evaluation and function exactly like before.

For actual usage, I've been working on an implementation of BRNS. I created a subclass of Individual that takes in a class that handles all of the memory / evaluation for determining the diversity of an individual. Then, I overrided .fitness and evaluate() as follows:

    @property
    def fitness(self):
        quality, diversity = self.brns.quality_diversity(self.evaluation["quality"], self.evaluation["behavior"])
        if self.global_quality:
            quality = np.mean(self.evaluation["quality"])
        return np.array([quality, diversity])

    def evaluate(self):
        evaluation = super().evaluate()
        self.brns.push_observation(self.evaluation["quality"], self.evaluation["behavior"])
        return evaluation

And included the BRNS train step as an operator in my pipeline. As you can see, relatively painless.

As a note, there was a lot of accompanying boilerplate due to the Individual parameterization issue.

lukepmccombs commented 1 year ago

In theory the changes I've made could be done in a subclass of its own, although the essentials would need to be reimplemented for DistributedIndividual etc. Maybe an AdaptiveIndividual class?

SigmaX commented 1 year ago

Ah, right—we need to collect behavior information from the same "simulation run" as fitness information in any given case, so they really do need to be done at the same time.

lukepmccombs commented 1 year ago

I am working on an extra package for QD algorithms. My current solution is a mixin that redirects the result from problem.evaluate() to .evaluation instead of .fitness (preconstructed for all our Individual subclasses), then leaving determining and assignment of fitness up to the algorithm / user. This is low code and fairly simple to use, although I am still working out some kinks.