bepu / bepuphysics2

Pure C# 3D real time physics simulation library, now with a higher version number.
Apache License 2.0
2.35k stars 272 forks source link

Batched child lookups for tree-involving collision tasks #34

Open RossNordby opened 6 years ago

RossNordby commented 6 years ago

While child lookups in mesh/compound tasks are not a dominant cost, they still show up quite clearly in profiling.

By virtue of dealing with a smattering of different convexes colliding with potentially different meshes or different areas of the same mesh, memory latency can be a concern. Currently, the per-pair lookups are handled sequentially, so the CPU spends a lot of time stalled. The intersection routines are also not vectorized.

Both of these issues can be addressed by dispatching a bunch of independent loads and then performing vectorized tests on the results.

This will require some overhead. There is no shared state between the different traversals, unlike the RayBatcher. Each step of the traversal will generate a new list of node targets which need to be gathered into a vector friendly format. Given the lack of knowns, that gathering process will likely need to also pull the query data into position.

This concept can also extend to ray tests. It's unlikely to beat the RayBatcher for bulk testing, but it's possible some variant could win on small packets. Ray tests will require a little more care in traversal ordering as usual, but there's nothing blocking that conceptually.

Potential issues:

  1. While this can hide a lot of memory latency, it introduces a bunch of gather work and instruction overhead. It could easily end up slower. Platform intrinsics could help if it turns out codegen issues are the main concern.
  2. The parallel traversal stack can become relatively large. If the batched test count is too high, it could easily start evicting other task batch data from L1. For collision tasks, batch size will be limited.