BB-5: Pointers to allocators/deallocators

mcmillan03 commented 3 years ago

GrB_init should be extended to pass in pointers to malloc, calloc, realloc, and free. Required for matrix import/export. See SuiteSparse:GraphBLAS GxB_init().

tgmattso commented 3 years ago

I don't know the best way to do this, but I agree in principle that a user should be able to specific the allocator used for any objects created by the graphBLAS. This may not be practical (I haven't thought through the implications) but it is something we may need.

DrTimothyAldenDavis commented 3 years ago

This is essential. I propose some method like (pick a name):

GrB_init2 (mode, mallocfunc, callocfunc, reallocfunc, freefunc) ;

RedisGraph requires it: https://github.com/RedisGraph/RedisGraph/blob/a07ef636521dbd10735b474389132673ce2158dc/src/module.c#L70

Julia requires it: https://github.com/JuliaSparse/SuiteSparseGraphBLAS.jl/blob/90ab8e7bf05bde81d4d09a0abe8a08b8767a2552/src/SuiteSparseGraphBLAS.jl#L126

MATLAB requires it: I cannot use malloc/calloc/realloc/free. mexFunctions must use mxMalloc/mxCalloc/mxRealloc/mxFree: https://www.mathworks.com/help/matlab/apiref/mxmalloc.html states "To allocate memory in MATLAB applications, use mxMalloc instead of the ANSI® C malloc function".

When used inside MATLAB, all libraries must use internal functions of the MathWorks: utMalloc/utCalloc/utRealloc/utFree, but this is not documented. I must take in function pointers to these 4 functions. I handle this in SuiteSparse as a result.

Python requires it too (but I couldn't find the source code).

On the GPU, I will require rmm_malloc/rmm_calloc/rmm_realloc/rmm_free, which are ANSI C11 compatible wrappers for functions that rely on the Rapids Memory Manager.

TBB has its own too: scalable_malloc/scallable_calloc/scalable_realloc/scalable_free: https://software.intel.com/content/www/us/en/develop/documentation/tbb-documentation/top/intel-threading-building-blocks-developer-reference/memory-allocation/c-interface-to-scalable-allocator.html

I know of no major user of GraphBLAS that doesn't require the use of their own memory manager.

Defining the memory manager must be done at GrB_init time, since that function will likely need to malloc some memory. Currently, my GrB_init does not have to malloc anything, but it did in the past, to support the no-input GrB_wait() function and to create thread-local space for the no-input GrB_error(). Those are gone now, fortunately!

But in general, it seems likely that GrB_init would like to malloc something, which then gets freed by GrB_finalize.

eriknw commented 3 years ago

We call GxB_init with NumPy's allocators here: https://github.com/GraphBLAS/python-suitesparse-graphblas/blob/main/suitesparse_graphblas/utils.pyx

Note that pygraphblas uses GrB_init and grblas uses GxB_init with the NumPy allocators. There are a few reasons I prefer to use them from NumPy:

It allows Python to track memory usage (if enabled), which gives fancy and sometimes helpful debug info (and is generally a good idea)
Avoid pain in Windows. Yes, there are users in Windows. And, yes, if you try to use multiple libraries that were compiled separately, they may use different allocators (even if they both simply use malloc/free), which can cause a bunch of pain and confusion.
Lets us use NumPy's allocators if they were configured differently.
Also, this allows us to perform zero-copy transfer of arrays to and from SuiteSparse, which is very nice.

@DrTimothyAldenDavis, your current: GxB_init accepts an additional argument: whether the allocator is thread-safe or not. Why is this not in the proposal? Doesn't MATLAB need this?

I would also like to point out that Python allows you to use user-defined allocators, and NumPy is about to:

https://numpy.org/neps/nep-0049.html https://www.python.org/dev/peps/pep-0445/

There is a lot of discussion around these proposals and the potential benefits of accepting user-defined allocators. Notably, I want to point out that both NumPy and Python user-defined allocator functions accept a context argument. This flexibility is sometimes needed, and it future-proofs the API. This context can serve the same purpose as the "thread safe or not" argument in GxB_init.

This is NumPy's proposal in a nutshell:

/* The declaration of free differs from PyMemAllocatorEx */
typedef struct {
    void *ctx;
    void* (*malloc) (void *ctx, size_t size);
    void* (*calloc) (void *ctx, size_t nelem, size_t elsize);
    void* (*realloc) (void *ctx, void *ptr, size_t new_size);
    void (*free) (void *ctx, void *ptr, size_t size);
} PyDataMemAllocator;

typedef struct {
    char name[128];  /* multiple of 64 to keep the struct aligned */
    PyDataMemAllocator allocator;
} PyDataMem_Handler;

const PyDataMem_Handler *PyDataMem_SetHandler(PyDataMem_Handler *handler)

DrTimothyAldenDavis commented 3 years ago

MATLAB’s managers are not threadsafe but that is a special case. I can write wrappers for the that call mxMalloc etc inside a critical section. That way I do not expose this ugliness to the world. So I’m suggesting a GrB init with the mode, and the 4 function pointers, and nothing else.

I actually only need malloc and free. I never calloc. And since internally I keep track of the sizes of all my malloced objects I can do my own realloc. I use the realloc function if provided but do my own if the function pointer is NULL.

But other implementations might want all 4 functions.

DrTimothyAldenDavis commented 3 years ago

Regarding the python context pointer: I don't think that will work to add the context pointer to the malloc/calloc/etc signatures. It's too disruptive. All the other methods can provide the 4 functions with the same signature as the ANSI C11 malloc/calloc/etc functions.

Couldn't Python place the ctx in a global pointer somewhere and then provide a function like this?

PyDataMem_Handler My_Handle ;   // a global
void *my_python_malloc (size_t size)
{
    return (My_Handle->malloc  (My_Handle->ctx, size)) ;
}

DrTimothyAldenDavis commented 2 years ago

This is required by #31

manoj63 commented 2 years ago

1) We should maintain the separation of concerns between application logic and performance. Shouldn't the right choice of allocator stay with the library and not the application programmer? The question to ask is what matrix characteristics the application programmer could provide the library to make this choice. Should calloc be made the default? 2) Performance concerns Could be addressed by sending a string of options, along the lines of Unix command line options. Rich set of C features to parse them. 3) We should maximize similarity between different language bindings of the GraphBLAS API. Putting allocator does not have a good equivalent in other languages. In C++, would it amount to passing a constructor for Matrices or vectors? 4) A reasonable line of inquiry would be to to ask Tim Davis 1) How it works today? 2) What would improve if he got the malloc/calloc/realloc specification? Other ways of achieving the same result without expanding the API and the headaches of these other ways.

DrTimothyAldenDavis commented 2 years ago

Ignore the thread-safe thing. That can be fixed outside of GraphBLAS. The allocator functions should be threadsafe. This option no longer appears.

This is something where the user application needs to be in control. The GraphBLAS library can’t make this decision. GraphBLAS is at the bottom of the food chain, so to speak. It must coexist with many packages above it.

Those other packages have to all agree on what allocators to use, if we want O(1) time and space sharing of data between them.

Many examples:

MATLAB requires that I use their internal allocators, utMalloc, utFree, etc. These are like mxMalloc and friends but at a lower level. If GraphBLAS couldn’t be forced to use utMalloc and friends, then it couldn’t be used inside MATLAB. Non starter. Say if instead GraphBLAS used only the ANSI C malloc and friends. Then the only way MathWorks could use GraphBLAS inside MATLAB would be to patch GraphBLAS somehow.

Using the GPU and unified shared memory requires that I use malloc-syntax wrappers for the Rapids Memory Manager. I can’t use the GPU otherwise. But if I decide to use these internally then the user application must either use them too, or be willing to give up O(1) time and space sharing.

Same is true for RedisGraph, and I think Python too.

-- Sent from Gmail Mobile

GraphBLAS / graphblas-api-c

BB-5: Pointers to allocators/deallocators #3