the current version of balance load uses a blocking send/recv and communication lists, but this can be a bottleneck on large scale computations. we'll need to change that to non-blocking communications at some point.
it is also tedious that the routine requires additonal memory for the send/recv operations: if called on a full grid with say 1024 blocks, balance_load may require 1536 allocated blocks.
the current version of balance load uses a blocking send/recv and communication lists, but this can be a bottleneck on large scale computations. we'll need to change that to non-blocking communications at some point. it is also tedious that the routine requires additonal memory for the send/recv operations: if called on a full grid with say 1024 blocks, balance_load may require 1536 allocated blocks.