cloth link-order optimization

The btSoftBody object from the BulletSDK includes an array of Nodes and Links. 
These links appear to be first set up to connect a node to between 5 and 6 of 
its neighbors [480 links], and then to the rest of the nodes after the 
execution of the Floyd-Warshall graph algorithm [another 930 links]. The way 
the links are stored by default, we have a number of cases where adjacent links 
share a node in common - this leads to the creation of a data dependency 
through memory. The PSolve_Links() function reads and writes nodes as it 
iterates over each link. So, we now have the possibility of a data dependency 
between iteration X that processes link L with iteration X+1 that processes 
link L+1 because L and L+1 have one node in common, and iteration X updates the 
positions of that node, and iteration X+1 reads in the position of that shared 
node.

Such a memory dependency limits the ability of a modern CPU to speculate beyond 
a certain point because it has to respect a possible dependency - this prevents 
the CPU from making full use of its out-of-order resources. If we re-order the 
links such that we minimize the cases where a link L and L+1 share a common 
node, we create a temporal gap between when the node position is written, and 
when it is subsequently read. This in turn allows the CPU to continue execution 
without risking a dependency violation. Such a reordering would result in 
significant speedups on modern CPUs with lots of execution resources. In our 
testing, we see it have a tremendous impact not only on the A7, but also on all 
x86 cores that ship with modern Macs. The attached source file includes a 
single function (Reoptimize_Link_Order) which can be called on a btSoftBody 
object in the solveConstraints() function before the actual solver is invoked, 
or right after generateBendingConstraints() once we have all 1410 links.
Original issue reported on code.google.com by erwin.coumans on 11 Nov 2013 at 6:40
Attachments:
reoptimizeLinkOrder2.cpp
SUBIRDUTTA1978 / bullet

cloth link-order optimization #766