BB-63: Improving Composability of GrB_init() and GrB_finalize()

Creating this issue based on our discussion of issue BB-10: [(https://bitbucket.org/aydozz/graph-blas-spec/issues/10/clarify-grb_init-and-grb_finalize-errors)]

Suppose there are two higher-level libraries, A and B, that both use GraphBLAS internally, opaque to the user.

A::init();
B::init();

auto a = A::DataType();
...

B::finalize();
A::finalize();

Currently, GrB_initmay only be called once. If B attempts to call GrB_initafter A has already initialized a GraphBLAS context, this will result in undefined behavior. Two possible solutions that could allow this code to work include 1) adding GrB_initializedand GrB_finalizedfunctions, allowing B to check whether there is a GraphBLAS context, or else 2) allowing GrB_init and GrB_finalize to be called more than once.

The current workaround for this would likely be to have the user call GrB_init, then pass any necessary information about the context into A::init() and B::init().

In our call this week, Jose pointed out an issue arising when high-level libraries are used in a non-overlapping way.

A::init();
A::finalize();

B::init();
B::finalize();

Currently, similar to MPI and OpenSHMEM, GraphBLAS does not allow initializing a new context after GrB_finalize has been called. Supporting this use case would likely require allowing GrB_init to be called more than once in an application. Otherwise, the user will always need to initialize GraphBLAS themselves.

I currently use GrB_init and GrB_finalize to initialize some global space, and nesting them like this will cause failures.

Internally, I have a statically allocated global array of size 64, of pointers to void *. I use this as a memory pool, where the kth item in the array is a head pointer to a link list of blocks of size exactly 2^k. These blocks came from malloc, and were then "freed" by me but instead of freeing them I stick them in this pool. I only do this for small blocks. This cuts down on "malloc a tiny block; free the tiny block ; malloc a tiny block ; free a tiny block", which is slow. This kind of churn occurs for things like BFS on the Road graph in the GAP benchmark.

GrB_init sets this array to all NULL, and GrB_finalize walks through the array, and each link list, and finally frees all the blocks.

Now suppose GrBinit / finalize is nested. Do the applications A and B have their own global space for this free pool? All calls to GrB*methods will access this free pool but if the 2nd GrB_init comes along, would it wipe out this free pool?

The 2nd GrB_init could instead note that GrB_init has already been called, and silently do nothing (and not return an error).

The 1st GrB_finalize could then free all these free pools, but then I would have to permit future calls to GrB methods. I could do that if you like. Freeing the set of free pools is threadsafe.

I do many more things in my GrB_init such as: (1) set the malloc/calloc/realloc/free. Woe to the applications A and B if they each want their own memory managers .. (2) I set the mode, blocking/nonblocking. Woe if A and B want different modes. (3) I set the max # of threads to use, the default format (by row or by col). Woe if A and B want different formats (A wants by-row as the default and B wants by-col). (4) I clear counters I use for debugging memory management problems ... these will fail abysmally if A still has memory allocated when B calls finalized.

I can work around many of these issues if you need me to, but not the malloc/calloc/realloc/free. I cannot let A use one set of managers and B another. I think I can work through all the other issues, even my free pool (A:init can clear it, B:init can see it is already initialized and not touch it; B:finalize can free it all, which is OK if A keeps going, and then A:finalize can free the free pool yet again). A and B would have to agree on all kinds of global settings, like the mode (GrB_BLOCKING and GrB_NONBLOCKING).

However this is solved, you'll need to consider how I'm using the free pools, if you want to change the semantics of GrB_init and GrB_finalize to support multiple calls to them, nested or otherwise.

One solution is for GrB_init to not return GrB_INVALID_VALUE if it has already been called. Instead, it could return GrB_SUCCESS and do nothing, since GraphBLAS is already initialized. See this test:

https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/905b1d54bef971db70180933454f65ab1f6f364b/Source/GB_init.c#L65

I know if GrB_init has even been called, since I have a global flag that starts as false, and gets set by GrB_init. This could be used to allow B:init to be done, after A:init. The call B:init would do nothing at all.

Then B:finalize can safely free the memory pools I have. This is thread-safe. Then A can keep working and restock the memory pool as it works, and then safely do A:finalize.

In this manner, I could support any mixture like

A:init
B:init
C:init
B:finalize
A:finalize
D:init
C:finalize

or whatever mix you like. The A:init would do all the work. The other inits would do nothing. All calls to finalize would free my internal pools but this could be made thread-safe, even if other threads are working with the pool.

But would this be OK with other implementations?

GraphBLAS / graphblas-api-c

BB-63: Improving Composability of GrB_init() and GrB_finalize() #28