This is related to my problem discussed in #14782, but I'm finding even by creating a thread-safe version of my IK problem, I'm getting poor cpu utilization when running in parallel.
I've set up five benchmarks to compare the differences
These results are surprising because it suggests that running for 64 iterations in a for loop single-core is faster than 64 concurrent threads without any constraints, and not that much worse with a pose constraint.
BM_solve_ik_random_constraints_single_core_early_break is significantly faster than running many threads in parallel to arrive at one solution, which is disappointing because I was hoping switching to a multi-threaded version would yield better performance.
You'll notice that the difference in CPU time between benchmark 4 and 5 is much more pronounced than the wall time, which I suspect means there is a lot of lock-contention happening somewhere.
Some notes about the benchmarks:
In typical use cases with single-core, you wouldn't recreate the IK problem, which is what I've done for benchmarks 1 and 3. For benchmark 5, I try to match the contents of the thread with the contents of the for loop, and I still think it illustrates the issue. You would expect benchmark 4 to be up-to 64 times faster than benchmark 5, but I've seen typically only about a 3x improvement like above.
This is related to my problem discussed in #14782, but I'm finding even by creating a thread-safe version of my IK problem, I'm getting poor cpu utilization when running in parallel.
I've set up five benchmarks to compare the differences
Solve(ik.prog())
NUM_THREAD
times with one thread without any constraints in a for loop ofNUM_THREADS
iterations (https://gist.github.com/brawner/9d1512745feeadffd02944580ccfc549#file-parallel-vs-single-core-comparisons-L68-L82)Solve(ik.prog())
NUM_THREAD
times without any constraints acrossNUM_THREADS
threads (https://gist.github.com/brawner/9d1512745feeadffd02944580ccfc549#file-parallel-vs-single-core-comparisons-L84-L108)single-threaded with a randomly reachable pose constraint for
NUM_THREADS` iterations, but with an early break if a successful configuration is found (https://gist.github.com/brawner/9d1512745feeadffd02944580ccfc549#file-parallel-vs-single-core-comparisons-L110-L134)Solve(ik.prog())
multi-threaded with a randomly reachable pose constraint withNUM_THREADS
threads (https://gist.github.com/brawner/9d1512745feeadffd02944580ccfc549#file-parallel-vs-single-core-comparisons-L136-L171)Results on a AMD Threadripper 32-core 3970X, with 64GB Ram, and 64 threads.
These results are surprising because it suggests that running for
64
iterations in a for loop single-core is faster than64
concurrent threads without any constraints, and not that much worse with a pose constraint.BM_solve_ik_random_constraints_single_core_early_break
is significantly faster than running many threads in parallel to arrive at one solution, which is disappointing because I was hoping switching to a multi-threaded version would yield better performance.You'll notice that the difference in CPU time between benchmark 4 and 5 is much more pronounced than the wall time, which I suspect means there is a lot of lock-contention happening somewhere.
Some notes about the benchmarks: In typical use cases with single-core, you wouldn't recreate the IK problem, which is what I've done for benchmarks 1 and 3. For benchmark 5, I try to match the contents of the thread with the contents of the for loop, and I still think it illustrates the issue. You would expect benchmark 4 to be up-to 64 times faster than benchmark 5, but I've seen typically only about a 3x improvement like above.
Full code at: https://gist.github.com/brawner/9d1512745feeadffd02944580ccfc549