deephaven / deephaven-core

Deephaven Community Core
Other
246 stars 79 forks source link

Support numba @guvectorize vector functions in the query language #4562

Closed chipkent closed 11 months ago

chipkent commented 11 months ago

Numba's guvectorize annotation is extremely powerful. It provides a few key features not found in vectorize, which is currently supported by DH.

  1. Support for vector inputs
  2. Support for vector outputs
  3. Support for numpy inputs and operations

The following code illustrates what is possible. The one notable unsupported feature from the numba side is fixed size outputs (https://github.com/numba/numba/issues/1668).

On the DH side, the query language needs to support outputs from the guvectorize functions. Currently, all guvectorize output columns are java.lang.Object , instead of being the proper scalar or array type.

Making this fix will allow high performance creation of native java arrays that can be seamlessly be used in operations such as grouping. Also, the array support makes it easier to directly use inputs as numpy arrays.


import numpy as np
from numba import vectorize, jit, guvectorize, int64

a = np.arange(5, dtype=np.int64)

@jit(int64(int64[:]),nopython=True)
def f(x):
    r = 0
    for xi in x:
        r += xi
    return r

print(f(a))

#vector input to scalar output function (m)->()
@guvectorize([(int64[:],int64[:])],"(m)->()",nopython=True)
def g(x, res):
    res[0] = 0
    for xi in x:
        res[0] += xi

print(g(a))

from deephaven import empty_table

t = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g(Y)")
m = t.meta_table

#vector and scalar input to vector ouput function
@guvectorize([(int64[:],int64,int64[:])],"(m),()->(m)",nopython=True)
def g2(x, y, res):
    for i in range(len(x)):
        res[i] = x[i] + y

print(g2(a,2))

t2 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g2(Y,2)")
m2 = t2.meta_table

# NOTE: the following does not work according to this thread from 7 years ago:
# https://numba-users.continuum.narkive.com/7OAX8Suv/numba-guvectorize-with-fixed-size-output-array
# but the latest numpy Generalized Universal Function API does seem to support frozen dimensions
# https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html#generalized-universal-function-api
# There is an old numba ticket about it
# https://github.com/numba/numba/issues/1668
# Possibly we could contribute a fix

# fails with: bad token in signature "2"

# #vector input to fixed-length vector ouput function
# @guvectorize([(int64[:],int64[:])],"(m)->(2)",nopython=True)
# def g3(x, res):
#     res[0] = min(x)
#     res[1] = max(x)

# print(g3(a))

# t3 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g3(Y)")
# m3 = t3.meta_table

# ** Workaround **

dummy = np.array([0, 0], dtype=np.int64)

#vector input to fixed-length vector ouput function -- second arg is a dummy just to get a fixed size output
@guvectorize([(int64[:],int64[:],int64[:])],"(m),(n)->(n)",nopython=True)
def g4(x, dummy, res):
    res[0] = min(x)
    res[1] = max(x)

print(g4(a,dummy))

t4 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g4(Y,dummy)")
m4 = t4.meta_table

# example using numpy

#vector input to fixed-length vector ouput function -- second arg is a dummy just to get a fixed size output
@guvectorize([(int64[:],int64[:],int64[:])],"(m),(n)->(n)",nopython=True)
def g5(x, dummy, res):
    res[0] = np.min(x)
    res[1] = np.max(x)

print(g5(a,dummy))

t5 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g5(Y,dummy)")
m5 = t5.meta_table

# example using numpy

#vector input to fixed-length vector ouput function 
@guvectorize([(int64[:],int64[:])],"(m)->(m)",nopython=True)
def g6(x, res):
    res[:] = x+5

print(g6(a))

t6 = empty_table(10).update(["X=i%3", "Y=i"]).group_by("X").update("Z=g6(Y)")
m6 = t6.meta_table
chipkent commented 11 months ago

Docs for guvectorize are bad. Here are some better ones. https://numba.pydata.org/numba-doc/latest/user/vectorize.html#vectorize https://numba.pydata.org/numba-doc/0.17.0/user/vectorize.html https://numba-users.continuum.narkive.com/7OAX8Suv/numba-guvectorize-with-fixed-size-output-array https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html#generalized-universal-function-api