Support for cuSPARSE/spmm_blockedell in cupy v11

Description

From cuda 11.2, cusparse SpMM can be used for performing sparse matrix - dense matrix multiplication, where the sparse matrix is represented in Blocked-ELL format (see https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuSPARSE/spmm_blockedell/spmm_blockedell_example.cpp).

This feature does not seem to be included in cupy v11.xx.

Am I missing something or was this particular functionality of cusparse spMM not linked to cupy yet? Is there a plan to do so ?

Thanks in advance.

Additional Information

I have looked into including spmm_blockedell and related routines by modifying the cupy_backends (cusparse.pyx, cusparse.cpp, cupy_cusparse.h) but, with minimum knowledge of the cupy source code, I am not sure if what I am trying will get me anywhere. The biggest hurdle seems to be the addition of the needed Python wrapper for cusparseCreateBlockedEll, etc.

Is there a simpler way to get this spmm_blockedell work within cupy?

If modifying the cupy_backends is the way to go, how to understand the way to generate the Python wrappers?

Some references:

https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuSPARSE/spmm_blockedell/spmm_blockedell_example.cpp

https://github.com/NVIDIA/CUDALibrarySamples/blob/master/cuSPARSE/dense2sparse_blockedell/dense2sparse_blockedell_example.c

https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-function-spmm