Swap to continous density field strategy for signal management

alice-i-cecile commented 1 year ago

After talking over the design and benchmarking results with @plof27 today, we've decided on a strategy to pursue for signal propagation (#1).

Key findings:

Signal diffusion takes wildly more time than any other operation. See the trace in #506.
Signals are accessed very rarely compared to diffusion costs. Consider a worst-case for look-up. Suppose we have 100 different signal types (very low), a unit on each tile (very high), and they must act each frame, following a signal gradient (very high). This gives us 7 lookups per tile. However, for diffusion, we have 7 * 100 lookups for each tile, and 100 mutations for each tile. This is backed up by the trace above using the MVP implementation.
Signals are generally quite sparse. In factory builders, you want a large variety of items and structures. Most of the time, that information is going to be irrelevant to anything that isn't fairly nearby: they'll always have something better to do. As a result, most signals should be treated as 0 in most places.
1. The actual signal propagation logic can be quite simple and still works well!

For some more realistic numbers from the initial MVP:

10% of tiles have units in them
actions take 0.5 seconds to complete on average
half of them involve signal gradient lookups
a new goal must be chosen (involving querying every signal on the tile) once every 5 seconds
each signal must be propagated to 5 neighbors (some cells are filled)
half of all tiles will have a signal emitter
1000 different signal types
signals are 0 in 99% of all tiles

For a discretized strategy, that gives us:

0.5 signal emissions per tile per signals pass
1000 signals 0.01 density 5 neighbors = 50 signal diffusions per tile per signals pass
1000 signals * 0.01 density = 10 decay and sparsification check per tile per signals pass
0.1 units per tile * (5 +1) neighbors / 0.5 seconds per action = 0.12 local gradient lookups for actions per tile per second
0.1 units per tile 1000 signals 0.01 density / 5 seconds = 0.2 goal lookups per tile per second

We can reduce how frequently our signals pass runs, but it probably can't be slower than once every half second or there will be perceptibly wrong results:

0.25 signal emissions per tile per second
25 signal diffusions per tile per second
2 decay and sparsification check per tile per signals pass
0.12 local gradient lookups for actions per tile per second
0.2 goal lookups per tile per second

Obviously signal diffusion is dominating, but decay matters too!

Ideally our map is:

10^3 tile map radius (big factory builder!)
10^6 tiles in size as a result (hey, that's still feasible with individual entities for tiles and structures!)

Every second, we must process 10^6 25 = 2.5 10^7 diffusion steps per second. Which gives us 4e-8 seconds for each, aka 40 nanoseconds. If we want a GPU with matrix acceleration, we can't easily take advantage of sparsity. Thankfully though, unlike traditional pathfininding, all operations are linear with respect to map size, and scale very slowly with unit count (because there's so many fewer operations).

So, instead of doing a discretized storage approach, we propose the use of a density field strategy. The fundamentals are fairly straightforward, and come from applied math.

Track relevant objects (signal emitters, barriers, special signal modifiers).
For each object type, determine a differential equation of how they affect the field strength near them.
Combine these equations to determine (for each signal type) an approximation that we can use to get the field strength at a position and time.
Use this approximation to lazily look-up the values when we need them.

We only actually care about the value of these signals where these fields are evaluated! And because they're evaluated so rarely (see above), we have a massive budget to work with.

Let's pessimistically round up to 0.5 lookups per tile per second. At 10^6 tiles, that means we have 2000 nanoseconds to do each computation! As a bonus, in many cases we only want very approximate results (one significant figure is basically fine for following a gradient upstream or choosing a goal at random), and so we can stop the computation very early.

Of course, the big challenge is actually writing the math to do this, especially as we add more game mechanics. But diffusion equations are a very well-studied field, so I'm feeling quite optimistic.

alice-i-cecile commented 1 year ago

Truly huge bases in Factorio look to be about 10k-100k tiles in width: getting even 1k width should still make the map feel sprawling given the increased diversity.

RobWalt commented 1 year ago

If we want to go the diffusion equation route, then it looks like a classical application of the heat equation plus some general decay. I can look into how we could implement it efficiently for this use case in the coming days when I'm back at home.

Just a general question: By signals we mean "ant signals", right? I was a bit confused at first because I thought the issue is related to event processing, but I think that's not it.

Also a general remark: Numerical simulations of differential equations are mostly implemented with arrays/vectors (afaik). This implies that we might need to refactor the signal management from the hashmap approach to a vectorized approach. I could imagine that this refactoring might result in some performance gains aswell, so it might be a good decision to split the issue into these two sub tasks and look at the benchmarks inbetween.

alice-i-cecile commented 1 year ago

Yep, heat equation plus decay is the plan :) The biggest stumbling blocks for me right now are:

Proof of concept of architecture based on the density field approach.
Modelling obstructions: signals must pass around structures so they still work for pathfinding.

References or prototypes on either would be super valuable; I'd love to collaborate on this work!

By signals we mean "ant signals", right?

These are the general purpose pathfinding + goal selection tool that we're using to drive AI here :) Might need a better name? The basic idea though is:

Everything gives off signals based on what it is, which diffuse to nearby tiles.
Allied structures give off push/pull signals requesting the items they want.
Units pick a goal by sampling their local signals.
Units walk up the signal gradient for pathfinding.

Also a general remark: Numerical simulations of differential equations are mostly implemented with arrays/vectors (afaik). This implies that we might need to refactor the signal management from the hashmap approach to a vectorized approach.

Definitely standard! Our needs are quite different here though: rather than one field that we care about modelling accurately everywhere, we have many fields that are zero in most places (because most items / structures will be clustered to little factory hubs). As a result, we probably won't actually benefit much from the increased cache locality of using something like an array or space filling curve.

Austreelis commented 1 year ago

I'm interested in this problem, I'll start looking at it in the next few days if that's alright :) I'm quite unfamiliar with emergence's simulation code, but it doesn't look too complex.

alice-i-cecile commented 1 year ago

All of the relevant code for this problem can be found in https://github.com/Leafwing-Studios/Emergence/blob/main/emergence_lib/src/signals.rs :)

RobWalt commented 1 year ago

Here are some theoretical thoughts on a simple first approach which doesn't need to make diffusion operation calculations each frame. The approach is based on the heat equation as mentioned above and treats all signals as diffusive signals. (For the future we might also look into other types of equations like the, for example, the wave equation for bird sound transmission or similar stuff). The approach also doesn't consider barriers or signal modifiers yet, but it should be extensible in that regard with some additional work.

A new approach

Let's start with some easy things: We are modelling diffusion + decay, so the approach will be parametrized by:

diffusion constant (this is k on wikipedia and is also called thermal conductivity there)
decay constant

The rest of the approach strictly follows the steps that are mentioned in the issues description:

Track relevant objects (signal emitters, barriers, special signal modifiers).

As mentioned in the starting paragraph, we simplify the model for now (no barriers or modifiers). The only properties of signals we need to track are:

position of signal emission
time of signal emission
initial signal strength

For each object type, determine a differential equation of how they affect the field strength near them.

Here we can utilize what's called a "fundamental solution" to the heat equation. It is basically a closed form solution (aka. function) which describes the values of the heat equation for arbitrary points in space and time. Here is what it looks like:

Combine these equations to determine (for each signal type) an approximation that we can use to get the field strength at a position and time.

For every signal type, we can create initial conditions (distribution function of the signal for t == 0) of the heat equations over the whole domain which can be an arbitrary function. We would craft this point wise with all the signal strengths from tiles of the same signal type, e.g.

g(x,y) = signal_strength[(x,y)]

This initial condition can be integrated into the fundamental solution of the heat equation to get a general solution of the form

Use this approximation to lazily look-up the values when we need them.

If a signal is accessed we basically need to do the following:

calculate the leftover signal strength that is still existing for all tiles with the same signal type taking decay into account
construct initial condition
evaluate specific closed form solution for that initial condition at
- x == accessed tile position
- t == time passed since signal emission

Practical notes

Batching over time

It's probably best to run multiple heat equation calculations for one cell and batch time with respect to time to avoid weirdness. The batched calculations can then be averaged to produce a compound result. The following scenario tries to explain why this might be important. Think of:

one signal which was emitted 1 minute ago 100m away from target tile
another signal which was emitted 1 second ago 1m away from target tile

If we use the approach from above and use t = 1 second, the signal that is far away would also only have 1 second to diffuse. This would probably imply that it wouldn't reach the target tile in any meaningful way. So it might be better to batch calculations for every 2 seconds or so. Additionally it might make sense to not consider any signal for the initial condition that is older than a threshold or further away than a given distance to prevent too many batches.

Evaluating the general solution

The general solution involves the convolution of two functions. This is a pretty complex mathematical operation which we can, however, calculate pretty efficiently by FFT. For those who haven't seen it yet, you can find a good 3B1B video on it here.

Also: Most of the fundamental solution can be calculated at compile time (there are many constants in the function) to reduce the computational footprint of the function further.

alice-i-cecile commented 1 year ago

Excellent starting analysis, thank you!

Leafwing-Studios / Emergence