bepu / bepuphysics2

Pure C# 3D real time physics simulation library, now with a higher version number.
Apache License 2.0
2.25k stars 261 forks source link

Sleeper exhibits performance cliff at >4,000,000 active bodies #284

Open RossNordby opened 11 months ago

RossNordby commented 11 months ago

The IslandSleeper works by initiating a constraint graph traversal at candidate bodies, terminating when a full island has been found or an active body is encountered (thereby forcing the whole island active).

The traversal marks bodies and constraints as visited using a couple of locally allocated IndexSet instances. They are as large as the largest BodyHandle or ConstraintHandle. For a simulation with 8 million bodies, the visited bodies set would be a megabyte (1 bit per body).

While allocating that space isn't much of a problem, clearing it for each new traversal can be. This is for two reasons:

  1. By default, the sleeper tries to analyze 1% of all active bodies per timestep. When there are 8 million active bodies, there are 80,000 traversals per timestep, and so 80,000 clears.
  2. 80,000 * (~1e6 bytes per traversal) / 100e9 bytes per second of memory bandwidth = ~0.8 seconds.

In other words, the cost of the IslandSleeper is weakly quadratic.

The reason why it's not a problem at smaller sizes is that a locally allocated IndexSet can be held entirely in core local cache. The moment a traversal gets large enough to evict itself, you see the bandwidth bound cost. That's why a smaller simulation of, say, 2 million active bodies takes 0.008 seconds to run the sleeper, not 0.2 seconds.

This is not exactly a major near-term priority. While CPUs and the library are both improving, we're still at least a factor of 10 off from needing to worry about simulations of 8 million active bodies in real time use cases.

Something to consider later, or if we have a compelling offline use case. People testing ridiculous simulations for funsies might get confused, I guess.

As a workaround, disabling the sleeper or dramatically reducing the aggressiveness of the sleeper (IslandSleeper.TestedFractionPerFrame and friends) would work.

RossNordby commented 11 months ago

Note for potential attempts at addressing this early: the IslandSleeper would benefit from a revamp. At the moment, it cannot multithread individual traversals, so even at more reasonable simulation scales, a single pile of 30,000 nearly-sleepy bodies can take a hefty chunk of frame time.

The changes required for a multithreaded traversal would also affect how body marking works, so any fix for the performance cliff alone might get obviated by later changes.