HPX hangs on larger cores, because by default blis was using non-hpx synchronization primitives. But when using hpx-runtime only hpx-synchronization primitives should be used. Hence, a C style wrapper hpx_barrier_t is introduced to perform hpx barrier operations.
Using hpx::for_loop with hpx::barrier with thrcomm_t on n_threads greater than actual hardware thread count causes synchronization issues making hpx hanging. This can be avoided by using hpx::futures, which are relatively very lightweight, robust and scalable to any number of threads.
hpx_barrier_t
is introduced to perform hpx barrier operations.hpx::for_loop
withhpx::barrier
withthrcomm_t
on n_threads greater than actual hardware thread count causes synchronization issues making hpx hanging. This can be avoided by usinghpx::future
s, which are relatively very lightweight, robust and scalable to any number of threads.