Support numba @guvectorize vector functions in the query language

Numba's guvectorize annotation is extremely powerful. It provides a few key features not found in vectorize, which is currently supported by DH.

Support for vector inputs
Support for vector outputs
Support for numpy inputs and operations

The following code illustrates what is possible. The one notable unsupported feature from the numba side is fixed size outputs (https://github.com/numba/numba/issues/1668).

On the DH side, the query language needs to support outputs from the guvectorize functions. Currently, all guvectorize output columns are java.lang.Object , instead of being the proper scalar or array type.

Making this fix will allow high performance creation of native java arrays that can be seamlessly be used in operations such as grouping. Also, the array support makes it easier to directly use inputs as numpy arrays.


import numpy as np
from numba import vectorize, jit, guvectorize, int64

a = np.arange(5, dtype=np.int64)

@jit(int64(int64[:]),nopython=True)
def f(x):
    r = 0
    for xi in x:
        r += xi
    return r

print(f(a))

#vector input to scalar output function (m)->()
@guvectorize([(int64[:],int64[:])],"(m)->()",nopython=True)
def g(x, res):
    res[0] = 0
    for xi in x:
        res[0] += xi

print(g(a))

from deephaven import empty_table

t = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g(Y)")
m = t.meta_table

#vector and scalar input to vector ouput function
@guvectorize([(int64[:],int64,int64[:])],"(m),()->(m)",nopython=True)
def g2(x, y, res):
    for i in range(len(x)):
        res[i] = x[i] + y

print(g2(a,2))

t2 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g2(Y,2)")
m2 = t2.meta_table

# NOTE: the following does not work according to this thread from 7 years ago:
# https://numba-users.continuum.narkive.com/7OAX8Suv/numba-guvectorize-with-fixed-size-output-array
# but the latest numpy Generalized Universal Function API does seem to support frozen dimensions
# https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html#generalized-universal-function-api
# There is an old numba ticket about it
# https://github.com/numba/numba/issues/1668
# Possibly we could contribute a fix

# fails with: bad token in signature "2"

# #vector input to fixed-length vector ouput function
# @guvectorize([(int64[:],int64[:])],"(m)->(2)",nopython=True)
# def g3(x, res):
#     res[0] = min(x)
#     res[1] = max(x)

# print(g3(a))

# t3 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g3(Y)")
# m3 = t3.meta_table

# ** Workaround **

dummy = np.array([0, 0], dtype=np.int64)

#vector input to fixed-length vector ouput function -- second arg is a dummy just to get a fixed size output
@guvectorize([(int64[:],int64[:],int64[:])],"(m),(n)->(n)",nopython=True)
def g4(x, dummy, res):
    res[0] = min(x)
    res[1] = max(x)

print(g4(a,dummy))

t4 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g4(Y,dummy)")
m4 = t4.meta_table

# example using numpy

#vector input to fixed-length vector ouput function -- second arg is a dummy just to get a fixed size output
@guvectorize([(int64[:],int64[:],int64[:])],"(m),(n)->(n)",nopython=True)
def g5(x, dummy, res):
    res[0] = np.min(x)
    res[1] = np.max(x)

print(g5(a,dummy))

t5 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g5(Y,dummy)")
m5 = t5.meta_table

# example using numpy

#vector input to fixed-length vector ouput function 
@guvectorize([(int64[:],int64[:])],"(m)->(m)",nopython=True)
def g6(x, res):
    res[:] = x+5

print(g6(a))

t6 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g6(Y)")
m6 = t6.meta_table

deephaven / deephaven-core

Support numba @guvectorize vector functions in the query language #4562