erincatto / box2d

Box2D is a 2D physics engine for games
https://box2d.org
MIT License
7.44k stars 1.47k forks source link

Read/Write dependencies on constraints in contact solvers impacting out of order processors #730

Open Developer-Ecosystem-Engineering opened 1 year ago

Developer-Ecosystem-Engineering commented 1 year ago

The contact solver creates constraints between Object A and Object B. When processing these we read values from A and B, and write results to A/B. However, constraints can be created such that the continuous orders have a shared object with the previous constraints that were just processed.

On an out of order processor like Apple silicon, this creates a read after write dependency that forces all constraints to be processed serially, even if other constraints related to other objects could be processed in parallel. If we randomize the order of constraints after they are generated (or the order which the contacts are processed when generating the constraints) it reduces the chances of two continuous constraints array elements being processed by a shared object. As a result we could see a performance improvement of more than 25% in several cases.

We are opening an issue to discuss this topic further per the readme!

erincatto commented 1 year ago

The way to make the solver parallel is to solve separate islands in separate jobs. These jobs should be more substantial and cause lower overhead on a task system, while still offering a large degree of parallelism on large scenes.

Developer-Ecosystem-Engineering commented 1 year ago

Hi Erin,

Thanks for getting back! We are referring more to micro parallelism by changing the order of processing within an island.

The solver work on pairs of elements. It reads the two items calc and store.

If my pairs are (1,2), (2,3), (3,4) the processor will process these constraints serially as any math in the second pair can not continue before the first pair has written it's results, as implemented today. i.e. the value of (3,4) cannot be computed until the value of (2,3) is resolved.

Assuming that we need to process all constraints and their processing order does not matter, which from your talks it does not matter, we could swap the second and third pair. Now parts of the compute of the second pair can start before the first pair has written its results as these don't have any common object, even if they belong to the same island. We are saying the processor can compute (3,4) before waiting on (2,3) to happen and doing this yields about 25% improvement in several cases.

erincatto commented 1 year ago

Sounds reasonable, still the first step should be parallel islands. Then optimization within a job becomes more relevant. Ordering constraints generally does not affect stability or convergence, but the order does need to be stable across frames even when constraints are added and removed. Per frame randomization will yield poor convergence of the solver.

I would first focus on optimizing the contact solver. Here are the inner loops. Velocity solver: https://github.com/erincatto/box2d/blob/9dc24a6fd4f32442c4bcf80791de47a0a7d25afb/src/dynamics/b2_contact_solver.cpp#L297 Position solver: https://github.com/erincatto/box2d/blob/9dc24a6fd4f32442c4bcf80791de47a0a7d25afb/src/dynamics/b2_contact_solver.cpp#L676

Each body could be given a monotonically increasing id. Then a contact constraint could have an id pair such as {id1, id2}. The contact array could be filled in a grid fashion to spread similar id pairs. Or maybe just using front and back fill would be sufficient (meeting in the middle). I would compare stacking stability with and without such re-ordering. Faster math but slower convergence could be a net loss.

Developer-Ecosystem-Engineering commented 1 year ago

Would you be open to a PR with this implemented to review?

erincatto commented 1 year ago

Perhaps make a fork and post your results here.

Developer-Ecosystem-Engineering commented 1 year ago

Noting that this is not abandoned and remains an area of interest on our part.

louis-langholtz commented 1 year ago

Interesting suggestions @Developer-Ecosystem-Engineering!

I'd like to look in detail at what you're proposing and am available for actively pursuing that. I've been playing around with suggestions like you and Erin are proposing. I'm not seeing a fork of Box2D however in your list of repositories.

erincatto commented 1 year ago

FYI: Box2D v3.0 is in active development in this repo: https://github.com/erincatto/box2c There has been a lot of progress on multithreading and work continues on performance improvements.

The Jolt Physics engine has implemented a large island splitter using graph coloring. This achieves the constraint independence you seek: https://github.com/jrouwe/JoltPhysics/blob/master/Jolt/Physics/LargeIslandSplitter.h

I'm considering using this in Box2D v3.0.

Developer-Ecosystem-Engineering commented 6 months ago

Hi Erin, apologies for the delay. Following process outlined, we've posted this to our own fork for review (you too @louis-langholtz )

Yes, we have also seen Jolt thank you for the tip =)

erincatto commented 6 months ago

Box2D version 3.0 is in alpha testing. It is using graph coloring as I mentioned before. It is working well on the M2 and is making use of Neon instructions.

Developer-Ecosystem-Engineering commented 6 months ago

Will take a look at 3.0 alpha!