MengeCrowdSim / Menge

The source code for the Menge crowd simulation framework
Apache License 2.0
138 stars 64 forks source link

Determinism in multi-threaded execution #154

Open curds01 opened 3 years ago

curds01 commented 3 years ago

Relates to #152.

Problem

When executing the simulation loop across multiple threads, we need to guarantee that two agents are allocated a pseudo-random number deterministically regardless of how they are scheduled by the operating system.

Proposals

  1. Conservative allocation: in this case, prior to every simulation step, all random number generators deterministically allocate one random value for each possible allocation (e.g., one per agent). This is stored in a table such that there is a deterministic map between agent and table entry. Regardless of what order the agents are evaluated, because the table has been pre-populated in a deterministic manner, and they have a known, unique mapping into the table, they'll receive their value deterministically.
    • Pros
      • This feels like a change that would be performed mostly under the hood with the smallest of changes leaking through to the element implementations.
    • Cons
      • Would require a change to the random number generator APIs:
        • Requesting a value would require a "key" value of some sort. Rather than simply requesting a value, we would have to request a value for an agent (or some other entity).
        • Random number generators would have to be informed of how big the table is that needs to be populated.
        • We'd have to make sure there's a robust mapping mechanism to account for (eventually) a changing population of agents and the possibility that there are more kinds of requesters than just agents.
      • Each step would start with a huge allocation step for random values despite the fact that some non-trivial portion of those values may not be used.
    • Unknowns
      • Are all random values associated with agents?
  2. Deferred allocation: Every request for a random value places the requester in a priority queue. Once all such requests have been made, the random number generators allocate values in priority order. In this case, the "priority" would be the natural ordering of the agent (or some such thing).
    • Pros
      • This doesn't "waste" random values or time in computing/storing random values that don't get used.
    • Cons
      • Has similar requirements on API changes (e.g., when we need to define natural ordering of the entities requesting random values).
      • Has one of the more complex implementations.
      • When an agent needs a random value, that agent's subsequent computation must be deferred until all other agents have been evaluated. If OpenMP deals with agents in blocks, that means we can't just suspend that agent's thread as it might interfere with other agents even starting the evaluation, leading to a race condition.
      • That means, we'd have to explicitly put the agent in a special deferred state.
      • Because BFSMs can evaluate through multiple transitions in a single state, this deferral process can occur multiple times in a time step.

Conclusion

We'll explore option 1. Option 2 seems to be high cost for what is nominally a performance optimization.

Note

This would also be a good time to upgrade the random number generation to modern C++.