MengeCrowdSim / Menge

The source code for the Menge crowd simulation framework
Apache License 2.0
139 stars 64 forks source link

determinism of simulation #105

Closed pgruenbacher closed 5 years ago

pgruenbacher commented 6 years ago

I know there's the project-level seed for the random generators. How feasible is this library for gamedev-level determinism? I think the first issue that comes to my mind is that I need to make sure the RandomGenerators aren't being called while in the omp parallel for loops.

I suppose it may also need to be compiled with fixed point precision instead of float precision. Does that seem feasible?

https://gafferongames.com/post/deterministic_lockstep/

pgruenbacher commented 6 years ago

Hm so yea it seems what when it gets down to the units interacting with on another the simulation can diverge fairly quickly.

image

pgruenbacher commented 6 years ago

double

pgruenbacher commented 6 years ago

ran periodic with random="10" on the project. Pretty sure it's diverging. I may try to recompile with OMP just to make sure it's not a parallel issue.

divegence2

pgruenbacher commented 6 years ago

diverged2

better proof. same time. same random seed. but selected 115 agent has differing position floats.

pgruenbacher commented 6 years ago

screenshot from 2018-09-03 17-31-17

simulation is deterministic when openmpi turned off... at least when run on same machine.

pgruenbacher commented 6 years ago

Adding a omp ordered keyword seemed to do the trick for syncing the periodic example. I guess cause the computePrefVelocity is being set in parallel, which could effect the other advances? I'm not sure...


        bool FSM::doStep() {
            // NOTE: This is a cast from size_t to int to be compatible with older implementations
            //      of openmp which require signed integers as loop variables
            SIM_TIME = this->_sim->getGlobalTime();
            EVENT_SYSTEM->evaluateEvents();
            int agtCount = (int)this->_sim->getNumAgents();
            size_t exceptionCount = 0;
            #pragma omp parallel for ordered reduction(+:exceptionCount)
            for ( int a = 0; a < agtCount; ++a ) {
                #pragma omp ordered
                {Agents::BaseAgent * agt = this->_sim->getAgent( a );
                try {
                    advance( agt );
                    this->computePrefVelocity( agt );
                } catch ( StateException & e ) {
                    logger << Logger::ERR_MSG << e.what() << "\n";
                    ++exceptionCount;
                }}
            }
            if ( exceptionCount > 0 ) {
                throw FSMFatalException();
            }
            return this->allFinal();
        }```
MengeCrowdSim commented 6 years ago

Looks like I'm late to this party. All I can do is confirm all of the things you've discovered. When evaluating the simulation in parallel, determinism flies out the window and for the reason you've concluded -- agents can enter/leave states in arbitrary order during step advancement (due to the method of parallelization). If any of the BFSM components has a single number generator that is shared across all of its agents (which most do), then the assignment of those random numbers are not deterministic.

I'm glad the ordered keyword solves the problem. You feel like putting that in a PR?

pgruenbacher commented 6 years ago

hm. well I'm guessing the ordered keywordis gonna cause some performance issues since it negates the parallelism benefit. I think since because the RandGenerator aren't thread-safe and they're being used in both test cases I've done (Vec2Dgenerator by the teleport action) and (Timer random) in my custom game case. Let's see if turning off that will resolve the issue. If so, then what I can instead do is have the random generator return random values using the agent id as a seed in combo with a seed that changes each SIM_TIME_STEP. Heck maybe just global_seed + Menge::SIME_TIME + agent->_id. That should mantain pseudo randomness while ignoring a local seed change.

pgruenbacher commented 6 years ago

yep changing timers from uniform dist to constant value resolves that problem. I'll move forward on implementing some new function signatures for the random distributors that accept agent id as a seed (and upate the fsm classes as well that use the distributors).

As for the float vs fixed point issue. My game involves 4000+ agents potentially, so I'll need some sort of determinism for predictive client-side, but I'll also probably need to do server-client updates to keep the game in sync no matter what, since it's not worth trusting deterministic lock-step purely. so I may be able to get away with float desyncing issues across different machines.

pgruenbacher commented 6 years ago

hm actually trying to get the random generators thread-deterministic is actually difficult when trying to also mantain pseudo-randomness. Like maybe each timestep the generator can update, but if multiple agents are accessing the random generator, then I'm not sure how to use that agent's id to randomize the result that each agent gets. Maybe seeds are generated for each agent at the start and then updated in the random generators. just need to enfoce that only the agent belonging to each thread is used, and not other agents. e.g. an agent shouldn't call random generator on other agents while in parallel block.

I'll have to make sure the seed per agents are synced / reconciled in client-server but *shrug

... any ideas?

MengeCrowdSim commented 6 years ago

The most general solution is that every element that supports random values must maintain a unique generator per agent. They should all be seeded differently (but deterministically) and then the order of agent evaluation will not matter. Any aggregation of random value generation (i.e., by sharing a seed) will cause correlated phenomena.

pgruenbacher commented 6 years ago

Right, I guess that makes sense since different random generator types for different elements would need to maintain their own seeds. Ok that makes sense.

Only thing left to investigate for me is how feasible it is to used a fixed point vs float everywhere.

On Tue, Sep 4, 2018 at 10:25 AM Menge crowd simulation < notifications@github.com> wrote:

The most general solution is that every element that supports random values must maintain a unique generator per agent. They should all be seeded differently (but deterministically) and then the order of agent evaluation will not matter. Any aggregation of random value generation (i.e., by sharing a seed) will cause correlated phenomena.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MengeCrowdSim/Menge/issues/105#issuecomment-418386795, or mute the thread https://github.com/notifications/unsubscribe-auth/ADLZ9gctM5SRZeWNkx7Zzxtn6mzGW8zmks5uXo1SgaJpZM4WYAnx .

MengeCrowdSim commented 6 years ago

Can you elaborate on what you're hoping to accomplish by using fixed-point computation

MengeCrowdSim commented 6 years ago

I generally suspect that moving from floating- to fixed-point computation will be intractable. I'd need to go through and template the scalars and then provide a fixed-point scalar. That is a huge, far-reaching endeavor. To see what that kind of thing looks like, take a look at https://github.com/RobotLocomotion/drake as an example.

pgruenbacher commented 6 years ago

Yikes yea I've seen that sort of templating for scalar in openfoam too. https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/fields/DimensionedFields/DimensionedScalarField/DimensionedScalarField.H I figured this would be something I'd have to do in my fork only so I wouldn't necessarily need to template it to make it compatible for other people. Float operations can be non-deterministic on different hardware from what I've read (though it is deterministic if run on same machine as shown earlier). If a simulation is perfectly deterministic then a multiplayer game could be run with just player actions being synchronized in lock-step. This allows for small bandwidth since only player actions need to be synced. However any small changes would lead to eventual divergence in the game state. Before I prematurely optimize though I should probably just try to get the game network running on my machine first deterministically and see if I can just have a server regularly update any divergences that may happen in the floats.

MengeCrowdSim commented 6 years ago

So, it may not be completely intractable. Some of my underlying mathematical components are already vectorized on scalar (Vector2/3 Matrix). So providing a fixed-point scalar would make those possible. And then that would have to smear over to the floating point generators (becoming "Real" generators for any "real" scalar). And then hitting all of the local members.... And then implicit conversion between fixed-point and floating-point at the visualization interface. Yikes, still lots of work.

I definitely support the idea of getting your network up and running first and seeing how much of a pain point this is.

And now, for my own curiosity, would you pre-condition all of your fixed-point values based on having a fixed simulation domain? That would certainly provide the advantage of not having to do run-time conditioning. it does ruin some of the Menge tricks of teleporting agents off to some distant domain (unless you encoded "infinity" in your fixed-point representation.) While I've toyed with fixed-point, I've never actually tried to engineer a system that would use them in a disciplined manner, so I'm curious as to some of the details.

Also, here was an interesting read on the subject of non-determinism of floating-point values:

https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/

Given that the majority of the operations are linear algebra, you may not have too many determinism issues. Transcendentals will be the biggest issue but, I hope, they don't crop up too much.

pgruenbacher commented 6 years ago

Yea I guess I'd do fixed simulation domain. I have a is_dead() base agent method in my github fork that I use to handle agents that should be ignored so I haven't needed a teleport action yet to disappear agents out of the domain. Yea the article is what mostly informed me. I think a good network system would still try to keep all the agents synced regardless of the determinism, since if you want players to hot-join a server game you'd have to be able to send them the current serialized game state anyways (or fast-forward the entire simulation through the player action list to the current time). A perfectly deterministic simulation can do stuff like replays too with a small file of all the player actions and events. Of course Total War has shown how fragile those game replays can be. Stuff is tough.

pgruenbacher commented 6 years ago

but again I should just test the simulations on different machines first anyways.

pgruenbacher commented 6 years ago

image

periodic deterministic now too. I'll make a PR for the vanilla files I changed.

pgruenbacher commented 6 years ago

actually here's the https://github.com/pgruenbacher/Menge/blob/determinism/src/Menge/MengeCore/Math/RandGenerator.cpp updated random generator code in my fork, may be easier to inspect this way. You'll see I hacked in a MAX_AGENT const and a fixed std::array for tracking per agent seeds in the random generator.