On a intel i7-4790k (4 physical cores) with a test set of ~1M objects, I get ~3.5x speedup.
On a dual E5-2643 (8 physical cores), I get ~5x speedup.
There is still a significant single-threaded portion, I suspect it is the insertion into the quad tree. I haven't checked, but I'm not sure it's concurrent-ready...
EDIT: Running gprof on a sample run shows that indeed, 10% of the time is spent in QuadTree::insert (82% is spent in QuadTree::updateBodyForce, and 4% in Layout::updateSpringForce).
On a intel i7-4790k (4 physical cores) with a test set of ~1M objects, I get ~3.5x speedup. On a dual E5-2643 (8 physical cores), I get ~5x speedup.
There is still a significant single-threaded portion, I suspect it is the insertion into the quad tree. I haven't checked, but I'm not sure it's concurrent-ready...
EDIT: Running
gprof
on a sample run shows that indeed, 10% of the time is spent inQuadTree::insert
(82% is spent inQuadTree::updateBodyForce
, and 4% inLayout::updateSpringForce
).