FLAMEGPU / FLAMEGPU2

FLAME GPU 2 is a GPU accelerated agent based modelling framework for CUDA C++ and Python
https://flamegpu.com
MIT License
106 stars 22 forks source link

Dependency Analysis #229

Open ptheywood opened 4 years ago

ptheywood commented 4 years ago

Rather than users having to describe models using explicit layers, using dependencies instead would allow the FLAME GPU library to determine the layers of execution, potentially improving performance through improved concurrency and reduced synchronisation points.

A directed acyclic graph (DAG) of the dependency relations should would need to be constructed from the user-provided dependencies between functions (via a new api).

An algorithm would then have to be applied to the DAG, to flatten the DAG into execution layers and synchronisation points. Alternatively this might map nicely to cuda graphs, although currently multiple graphs would be needed and this might not actually improve performance.

An initial implementation would include a dependency graph which is flattend to a set of layers for execution. Longer term a dependency graph could be concretised into an execution graph, which is used within CUDAAgentModel rather than layers, although this would likely still need to form layers for concurrency.

ptheywood commented 4 years ago

Some current thoughts on how dependency analysis might work, in the context of a model implementation.

This only covers an example model with one agent type in one state, so will need expanding to cover multiple agent types, and/or multiple states, but the general idea will likely remain the same.

The example is a modified version of the circles model, splitting the move function into 2 parts to make the dependencies a bit more interesting, and adding a host function.:

In practice, the execution layers should be:

This assumes host_function could run in parallel with circle_move. In practice this is likely to require it's own layer.

The DAG would be

    circle_output_location
             |
             v
    circle_input_location
      |               |
      v               v
circle_move     host_function

The implementation would aproximately be:


// Agent functions
FLAMEGPU_AGENT_FUNCTION(circle_output_location, FLAMEGPU::message::None, FLAMEGPU::message::BruteForce) { /* ... */ }
FLAMEGPU_AGENT_FUNCTION(circle_input_location, FLAMEGPU::message::BruteForce, FLAMEGPU::message::None) { /* ... */ }
FLAMEGPU_AGENT_FUNCTION(circle_move, FLAMEGPU::message::None, FLAMEGPU::message::None) { /* ... */ }

// host functions.
FLAMEGPU_HOST_FUNCTION(host_function) { /* ... */ }

int main(int argc, char*argv[]){
    // New model description
    FLAMEGPU::model::ModelDescription model("circles");

    // Define the environment which is common to the model, with the default values.
    EnvironmentDescription &environment = model.Environment();
    environment.newVariable<int>("seed", 0);
    environment.newVariable<int>("width", 1024);

    // Declare agent types
    FLAMEGPU::model::AgentDescription &circleAgent = model.newAgent("circle");
    // Id is automatic.
    circleAgent.newVariable<float>("x");
    // ...
    circleAgent.newState("default");
    // ...

    // Declare messages
    FLAMEGPU::model::MessageDescription &locationMessage = model.newMessage("location");
    locationMessage.newVariable<float>("id"); // id is not automatic?
    locationMessage.newVariable<float>("x");
    // ...

    // Add output_location function to the circle agent.
    FLAMEGPU::model::AgentFunctionDescription fn_circle_output = circleAgent.newFunction("output_location", circle_output_location);
    // State which message it produces as output - an implicit dependency
    fn_circle_output.setMessageOutput(locationMessage);

    // input_location
    FLAMEGPU::model::AgentFunctionDescription fn_circle_input = circleAgent.newFunction("input_location", circle_input_location);
    // State which message it takes as input - an implicit dependency
    fn_circle_input.setMessageInput(locationMessage);

    // Add the circle movement function
    FLAMEGPU::model::AgentFunctionDescription fn_circle_move = circleAgent.newFunction("move", circle_move);

    // Host functions, which exist within layers
    FLAMEGPU::model::HostFunctionDescription fn_host_function = model.newHostFunction(host_function);

    // Dependencies can then be specified
    circle_output_location.dependsOn(NULL);
    circle_input_location.dependsOn(circle_output_location);
    circle_move.dependsOn(circle_input_location);
    host_function.dependsOn(circle_input_location);

    // ...

}

The DAG would be validated (checked for loops) and flattened into layers (include some form of topological sorting algorithm, but with branching. Coffman Graham?) when the model if made concrete.

It would likely be worth adding an interface which is implemented by the various Description classes, to provide the dependency support.

Several dependsOn() prototypes could exist:

Internally to the ModelDescription, adding message inputs/outputs / would internally add to the draft dependency graph. This is not going to be enough information in all cases. I.e. where multiple agent functions output to the same message list - what order do they go in?

The same would apply for states etc.

ptheywood commented 4 years ago

DAG for PedestrianNavigation

Example of dependencies for the PedestrianNavigation example of FLAME GPU 1.

FLAME GPU 1 Layers

<layers>
    <layer>
      <gpu:layerFunction>
        <name>generate_pedestrians</name>
      </gpu:layerFunction>
    </layer>
    <layer>
      <gpu:layerFunction>
        <name>output_pedestrian_location</name>
      </gpu:layerFunction>
      <gpu:layerFunction>
        <name>output_navmap_cells</name>
      </gpu:layerFunction>
    </layer>
    <layer>
      <gpu:layerFunction>
        <name>avoid_pedestrians</name>
      </gpu:layerFunction>
    </layer>
    <layer>
      <gpu:layerFunction>
        <name>force_flow</name>
      </gpu:layerFunction>
    </layer>
    <layer>
      <gpu:layerFunction>
        <name>move</name>
      </gpu:layerFunction>
    </layer>
</layers>

ModelDescription model = // ...

// Agents
AgentDescription a_pedestrian = // ...
AgentDescription a_navmap = // ...

// message lists
MessageDescription m_pedestrian_location = // ...
MessageDescription m_navmap_cell = // ...

// Agent functions without any dependencies specified

AgentFunctionDescription f_generate_pedestrians = a_navmap.newFunction(/* ... */);
AgentFunctionDescription f_output_navmap_cells = a_navmap.newFunction(/* ... */);

AgentFunctionDescription f_output_pedestrian_location = a_pedestrian.newFunction(/* ... */);
AgentFunctionDescription f_avoid_pedestrians = a_pedestrian.newFunction(/* ... */);
AgentFunctionDescription f_force_flow = a_pedestrian.newFunction(/* ... */);
AgentFunctionDescription f_move = a_pedestrian.newFunction(/* ... */);

// Specify message output/input (implicit dependencies)
f_output_navmap_cells.setMessageOutput(m_navmap_cell);
f_output_pedestrian_location.setMessageOutput(m_pedestrian_location);

f_avoid_pedestrians.setMessageInput(m_pedestrian_location);
f_force_flow.setMessageInput(m_navmap_cell);

// Specify agent output (implicit dependency / concurrency blocker)
f_generate_pedestrians.setAgentOutput(a_pedestrian);

// Set agent death (blocks concurrency)
f_force_flow.setAllowAgentDeath(true);

Following on from the implicit dependencies which are already defined, explicit dependencies are required to instanciate the model, as it is unclear on the order of execution otherwise.

The simplest approach is to require atleast one explicit dependence for each function - making the DAG->layers algorithm simple:

f_generate_pedestrians.dependsOn(); // Identify this is L0
f_output_pedestrian_location.dependsOn(f_generate_pedestrians);
f_output_navmap_cells.dependsOn(f_generate_pedestrians);
f_avoid_pedestrians.dependsOn(f_output_pedestrian_location);
f_force_flow.dependsOn(f_avoid_pedestrians)
f_move.dependsOn(f_force_flow);

Alternatively, a more complex layering algorithm should be able to deal with collisions within a wider DAG, such that it will generate the fewest layers possible where collisions occur.

I.e. output_navmap_cells does not actually depend on generate_pedestrians, rather it could be concurrent if it were not for the concurrency blocker of same agent same state. This may be very challenging in practice, as what if the user provided output_navmap_cells function mutates an agent variable used within generate_pedestrians. This would require either extra specification of what the function will modify, to allow the correct order to be calculated.

f_generate_pedestrians.dependsOn(); // Identify this is L0
f_output_pedestrian_location.dependsOn(f_generate_pedestrians);
f_output_navmap_cells.dependsOn();
f_avoid_pedestrians.dependsOn(f_output_pedestrian_location);
f_force_flow.dependsOn(f_avoid_pedestrians)
f_move.dependsOn(f_force_flow);

It may also be possible to infer the order of functions based on message dependencies. I.e. f_avoid_pedestrians depends on the m_pedestrian_location, so it could be inferred that this must be executed after f_output_pedestrian_location. And f_force_flow depends on m_navmap_cell, so it must be called after f_output_navmap_cells, however these are both from teh same agent in the same state so they cannot execute in the same layer, but which order is correct depends on their implementations, hence the explicit f_force_flow.dependsOn(f_avoid_pedestrians)

It should be possible to remove the explicit f_avoid_pedestrians.dependsOn(f_output_pedestrian_location); dependence and still construct layers in this case, however if multiple agent functions output to a message list implicit ordering would be much more ambiguous.

For this model, it should be possible to reconstruct the layers used with just 3 explicit function dependencies, alongside the implicit layer dependencies, but more complex models would likely need many more dependencies explicitly stating.

If this is looked at from a modelling perspective, rather than the layer-based perspective it might be possible to make this specification simpler.

In this model, the move method is the final step of the iteration. It relies on the pedestrian agent being aware of the forces from other pedestrians, and from the environment: f_avoid_pedestrians and f_force_flow (assuming these are order-independent).

f_avoid_pedestrians and f_force_flow have implicit message input dependencies on m_pedestrian_location and m_navmap_cell respectively.

m_pedestrian_location and m_navmap_cell have implicit message output dependencies on f_output_pedestrian_location and f_output_navmap_cells respectively. Lastly output_pedestrian_location must have an explicit dependence on generate_pedestrians (potentially implicit, but order is unclear).

This would produce the dependency graph (from the bottom up):

n.generate_pedestrians() 
            |
            v
p.output_pedestrian_location() n.output_navmap_cells()
            |                       |
            |                       v
<m_pedestrian_location>        <m_navmap_cell>
            |                       |
            |                       v
p.avoid_pedestrians()     p.force_flow()
                  |       |
                  v       v
                  p.f_move()

and only require 2 (3?) explicit dependencies.

f_move.dependsOn({f_avoid_pedestrians, f_force_flow});
f_output_pedestrian_location.dependsOn(f_generate_pedestrians);

Flattening this to layers would only require deciding which method of f_avoid_pedestrians and force_flow is executed first. If they were known to be independent (i.e neither mutates an agent member variable known by the other) they could even be executed concurrently or fused into a single kernel launch.

ptheywood commented 4 years ago

The contents of a message lists persists from one iteration of the model, to the first agent function which outputs to that message list in the subsequent iteration.

This reduces the usefulness of message list dependencies to determine order of execution (other than no output and input to the same message list in a single layer), as even in a simple case with just an input and a output functions, both output -> input and input -> output would be valid, just with differnt behaviour.

dentarthur commented 4 years ago

I am still catching up on reading but started thinking about Domain Specific Language (DSL) based on Model Driven Architecture (MDA) that would provide dependency analysis at the higher level model rather than doing it in CUDA code.

So in case its relevant here I will just link to the stuff I am still studying, without notes explaining.

https://github.com/thecapitalistcycle/CSXMS/blob/master/FGPU_DSL_links.md