Open lukepmccombs opened 1 year ago
Another associated issue is how do we dictate maximize
for the Problem
after changing fitness
according to the function of behavior? Its common practice to use the diversity value to turn single objective problems into multi objective, which brings along more issues if the underlying Problem
was single objective.
A very invasive solution is to merge MultiObjectiveProblem
and ScalarProblem
using the principle that dominance degenerates to plain greater than in the single objective case (this is a core principle of UNSGA-III). Done right this shouldn't cludge anything, but the issue still remains of how to gracefully override maximize
.
So, first of all love this (I have a soft spot in my heart for QD algorithms).
When we tried (and then reverted) an overhaul of the Problem.evaluate()
interface last year (see #193), we settled on the idea of relying on **kwargs
when fitness evaluation requires information beyond just the phenome
that comes out of the decoder.
Is your point here that using that approach requires subclassing Individual
, which can pose a problem for other reasons (perhaps related to #278)?
My own intuition was to avoid the idea of subclasses altogether wherever possible in leap (as I tend to strongly favor composition in place of inheritance). @markcoletti has made a good case for the usefulness of Individual
subclasses though...
Composition, or building alternates to Individual
and Problem
, is an option. Although, this would require re-implementing everything that makes comparators and evaluation painless.
The issue with using the existing .evaluate()
is that when doing distributed evaluation all parameters are copied without being returned or updating the original. For QD algorithms that translate behavior into fitness statefully, like Novelty Search and Surprise Search, this prevents them from learning new behaviors. Dask has some methods to pass stateful objects, but then a QD implementation would have to have separate code for single / multi threaded.
Also, yes subclassing Individual
would have a stopping point here since you'd need that shared object.
Another option is to support dictionary return values for .evaluate()
. The default case for non-dictionary return values could be the existing assign to fitness
, while in other cases keys either get added to the namespace of the Individual
or the dictionary kept as a member variable (in namespace form or original). This is about as adaptive a solution as you can get. It could also go the gym
route and make the return value a tuple of fitness
, info
for the user to pass information out.
Edit: For reference of a similar interface, the Deep RL library Ray Train uses a similar method to control multiple actors.
This last option is growing on me. It opens opportunities for adaptive fitness by separating where the evaluation is stored from the fitness property. For instance, rather than returning a fitness at evaluation, the user could return desirable characteristics and subclass Individual
to override the .fitness
property with some transformation thereof. In the case of QD like Novelty Search, this could be providing the Individual
with an NS object and querying the archive at the time of fitness being accessed.
@markcoletti and I were chatting about this—what about treating fitness & novelty/diversity evaluation as entirely separate processes? So, the usual evaluation operator/Problem
approach would evaluate fitness/quality, but a separate operator would be used to evaluate novelty/diversity.
Then some third operator (?) would synthesize the two in an algorithm-specific way.
The question, I suppose, is whether an approach like this would be flexible enough to model all the major QD algorithms—ex. novelty search with LS, surprise search, MAP-elites, etc. (incidentally I have always viewed MAP-elites as a form of island model, but that's a discussion for another time).
I don't believe that approach is flexible enough as you surmised. The trouble is if the QD algorithm uses a behavior descriptor that is drawn from the observed states, or has to observe them itself like surprise search, then quality and diversity, or at the very least the values used to determine them, have to be evaluated simultaneously.
On a local branch I've given a trial run at replacing Individual.fitness
with a new variable Individual.evaluation
and the following property:
@property
def fitness(self):
if hasattr(self.evaluation, "__getitem__") and "fitness" in self.evaluation:
return self.evaluation["fitness"]
elif hasattr(self.evaluation, "fitness"):
return self.evaluation.fitness
return self.evaluation
This lets you return a dictionary containing fitness, a class with the fitness
property, or raw fitnesses, which I think would cover quite a few base cases. Obviously some edge cases, but this was just to demonstrate how flexible we could make it. If we just wanted a super simple base case, it could easily be made an alias to .evaluation
and function exactly like before.
For actual usage, I've been working on an implementation of BRNS. I created a subclass of Individual
that takes in a class that handles all of the memory / evaluation for determining the diversity of an individual. Then, I overrided .fitness
and evaluate()
as follows:
@property
def fitness(self):
quality, diversity = self.brns.quality_diversity(self.evaluation["quality"], self.evaluation["behavior"])
if self.global_quality:
quality = np.mean(self.evaluation["quality"])
return np.array([quality, diversity])
def evaluate(self):
evaluation = super().evaluate()
self.brns.push_observation(self.evaluation["quality"], self.evaluation["behavior"])
return evaluation
And included the BRNS train step as an operator in my pipeline. As you can see, relatively painless.
As a note, there was a lot of accompanying boilerplate due to the Individual
parameterization issue.
In theory the changes I've made could be done in a subclass of its own, although the essentials would need to be reimplemented for DistributedIndividual
etc. Maybe an AdaptiveIndividual
class?
Ah, right—we need to collect behavior information from the same "simulation run" as fitness information in any given case, so they really do need to be done at the same time.
I am working on an extra package for QD algorithms. My current solution is a mixin that redirects the result from problem.evaluate()
to .evaluation
instead of .fitness
(preconstructed for all our Individual
subclasses), then leaving determining and assignment of fitness up to the algorithm / user. This is low code and fairly simple to use, although I am still working out some kinks.
Quality Diversity algorithms have an additional characteristic recorded when evaluating individuals, their behavior descriptor. Depending on the algorithm, this can be translated into fitness in a variety of ways, and the process is almost always stateful. The current
Problem
cannot introduce new member variables to theIndividual
without subclassingIndividual
and altering theevaluate
interface. Altering theProblem
to translate behavior into fitness at evaluate time is not an option since in distributed evaluation each problem would be accessing different states.An option is to store the quality and behavior descriptors with fitness as returned by
.evaluate()
, then decompose and convert to fitness using an operator, but this isn't super appealing to me.A pie in the sky ask would be to alter the base
Problem
andIndividual
evaluate interfaces to support QD throughout. My idea is something along this line, barring more intuitive name choices:Problem
: A function,evaluate_fitness
that calculates the traditional fitness of the individual and defaults to returningNone
. A function,evaluate_behavior
that calculates the behavior descriptor of the individual and defaults to returningNone
. The originalevaluate
, defaulting to callevaluate_fitness
andevaluate_behavior
and return them as a tuple.Individual
:evaluate
changed to store the respective return values tofitness
andbehavior_descriptor
.Then operators to join or replace
fitness
with some function ofbehavior_descriptor
.This would let the user implement any combination of behavior and fitness. If they want a normal optimization algorithm, they can simply implement
evaluate_fitness
. If they want to determine behavior separately, or are just doing a diversity based approach, implementevaluate_behavior
. If the two have to be determined jointly, they can overrideevaluate
and ignore the other two functions. Migrating existing code would just require renamingevaluate
on existingProblem
implementations toevaluate_fitness
.