Closed markcoletti closed 1 year ago
Moved this to the backlog as it's not a priority. Though it's likely that there may be a race condition for involving DistributedIndividual
's birth_id
, I suspect it's somewhat mitigated by how we currently use Dask in both the asynchronous and synchronous implementations. However, I will be occasionally assessing run results for evidence of such race conditions, as possibly evidenced by duplicate or missing birth ids.
What was the observed behavior of the birth counter issue? I've found two things of note:
If you count the evaluated individuals, you will get init_pop_size + max_births + nonviable. This is because the initial population isn't counted into the births, which is intentional per the commit message, and nonviable individuals are counted down.
There was a bug where viable individuals were counted twice, which was probably left over from when you looked into it before.
Essentially had observed duplicate or skipped IDs and reasoned that this was likely due to a race condition.
Actually, the initial population should have birth IDs automatically assigned when created. It's just that when we fall into the the offspring creation loop that we start counting from where we left off with the last ID from the initial population. And then there's the bookkeeping for whether we want non-viable individuals to count towards the birth budget.
I don't see any duplicate birth IDs, but I think I know what might be the cause of the skipped IDs. If you use the crossover operator, but only pool 1 individual at a time in the pipeline, it discards the other individual. But it was still "birthed", and the ID counter is incremented for it. A pretty quick test of changing the pool size to 2 resulted in the highest observed ID at the end of the evolution process being 1 less than the total number observed (i.e. the 0-indexed max as you would get with itertools.count
).
The other individual curing crossover is not discarded, but is rather sent the next time that operator is pulsed. That's why it has two different yield
statements. This is actually a significant difference between LEAP and DEAP in that DEAP does discard one of the offspring, which introduces a source of genetic drift.
That is the case for the by-generational pipeline, but I don't believe it is so for the steady state pipeline. Each time the algorithm goes to produce offspring, the pipeline is called again, reinstantiating the crossover operator's local frame. If you produce multiple at once, they will all come from the same instance and receive the successive children, but not so between calls.
I don't think the crossover operator is re-instantiated between pipeline pulses; the operators are already instantiated when passed in via the offspring_pipeline
argument to steady_state
. You could probably cons up a quick test to observe what actually happens.
I made a test script with a stripped down version of uniform_crossover
. The number of instantiations increases equally with number of births.
from toolz import curry
from leap_ec.ops import iteriter_op
from typing import Iterator
from distributed import Client
from leap_ec import ops
from leap_ec.representation import Representation
from leap_ec.binary_rep.problems import MaxOnes
from leap_ec.binary_rep.initializers import create_binary_sequence
from leap_ec.distrib.individual import DistributedIndividual
from leap_ec.distrib import asynchronous
MAX_BIRTHS = 1000
instantiation_count = 0
@curry
@iteriter_op
def dummy_crossover(next_individual: Iterator,
p_swap: float = 0.2, p_xover: float = 1.0) -> Iterator:
"""
Structurally identical to uniform_crossover, except all crossover logic was removed.
Function identity (i.e. params) was kept so curry behaves the same.
"""
global instantiation_count
instantiation_count += 1
while True:
parent1 = next(next_individual)
parent2 = next(next_individual)
yield parent1
yield parent2
if __name__ == "__main__":
pipeline = [
ops.random_selection,
dummy_crossover,
ops.pool(size=1)
]
with Client(n_workers=20) as client:
final_pop = asynchronous.steady_state(
client, max_births=MAX_BIRTHS, init_pop_size=5, pop_size=3,
representation=Representation(
initialize=create_binary_sequence(4),
individual_cls=DistributedIndividual
),
problem=MaxOnes(),
offspring_pipeline=pipeline
)
print(f"Max Births: {MAX_BIRTHS}")
print(f"Instantiation Count: {instantiation_count}")
# Max Births: 1000
# Instantiation Count: 1000
This is with my local branch that removes the double count on valid individuals.
Thanks for the test snippet! That's disappointing. (And fortunate that I don't generally use crossover!). We can discuss on Slack the possible workarounds. This may also have ramifications for other pipeline operators, though I can't think offhand of likely candidates.
Here is an example higher order function that could fix it. It keeps children in a local deque before they are yielded. The way yield works will prevent it from making excess children (only continuing execution when requested for the next), and on future calls any leftover children are yielded on that run instead.
def dummy_crossover():
next_children = deque()
@iteriter_op
def _crossover_op(next_individual) -> Iterator:
while True:
while next_children:
yield next_children.popleft()
parent1 = next(next_individual)
parent2 = next(next_individual)
next_children.extend((parent1, parent2))
return _crossover_op
@lukepmccombs Oh, yikes. So, am I understanding that we may have an issue where our code assumes that tools.curry()
acts like a closure—returning a single instance so if it's called multiple times we can yield multiple values—but in reality it ends up re-instantiating every time it is executed?
I just reviewed your PR, but if it's aiming to solve the observation here I'm not sure why the deque
is needed. What about a closure that behaves the way the @curry
was intended to?
def dummy_crossover():
next_children = deque()
@iteriter_op
def _crossover_op(next_individual) -> Iterator:
while True:
while next_children:
yield next_children.popleft()
parent1 = next(next_individual)
parent2 = next(next_individual)
yield parent1
yield parent2
return _crossover_op
The closure is needed to persist next_children
's contents between calls. The individuals have to pass through the deque
in order to be caught whenever the function closes; if we use a raw yield with the children, not having come from the deque
, then if collection ends between the first and second child, the second will be lost.
Here is another version of the function that may be clearer (deque
isn't particularly useful here, I thought it worked differently than it does. It technically is still appropriate, but so few are in the queue at a time that its merits don't show)
def dummy_crossover():
second_child = None
@iteriter_op
def _crossover_op(next_individual) -> Iterator:
nonlocal second_child
if second_child is not None:
# Return a child left from another run
# Swap with None has to happen before the yield to be certain its executed
ret_child, second_child = second_child, None
yield ret_child
while True:
parent1 = next(next_individual)
parent2 = next(next_individual)
# The second child needs to be stored before yielding the first child (parent1)
# Generators only execute code if necessary, so children will only be generated
# if the first child needs to be yielded. That's why it doesn't need to be stored
# in the closure as well.
second_child = parent2
yield parent1
# Remove the second child from the closure and yield it if the generator does
# get requested for it
ret_child, second_child = second_child, None
yield ret_child
return _crossover_op
Copying my comment from #273 for summary purposes:
The observation is that when our crossover operators are used in a steady-state population model specifically, then because we are generating a single offspring at a time in each evolutionary step, we end up losing the functionality of generating a pair of two offspring for each pair of parents.
So the original operators work as designed—one instance will correctly generate two complementary offspring when you call it twice. The problem is that we are only calling the pipeline once in each step for steady-state algorithms, so the second offspring gets created but never yielded (thus inflating our birth count).
The issue is present in all of our pipelines, just most prevalent in steady state. It occurs when the pipeline is made to collect an odd number of individuals, which is almost always 1 in the case of steady state. An individual is birthed by the crossover operator, but discarded before it is consumed by a pipeline operator further down. For instance, if you were pooling a population of size 5, like in the neural network cartpole example, 1 in 6 births would be discarded.
The operators are called once per pipeline execution. That is every evaluation (and subsequent offspring birthing) for steady state, and once per generation for the generational ea. It is possible to bypass this issue for either by just using an even numbered pool, but in the case of steady state that results in an explosion of individuals in queue.
This issue is resolved. Closing.
Where I explore the possibility that there is a race condition for calculating the birth counter in the ASEA