deephaven / deephaven-core

Deephaven Community Core
Other
254 stars 80 forks source link

Make it easier for Python users to work with DH columns of vector types. #3830

Open jmao-denver opened 1 year ago

jmao-denver commented 1 year ago

Columns of vector types can be created by group-by aggregations but they are not easily consumable for the Python users.

Issues:

  1. when used as an argument to a Python func, DH converts the vector to a Java array of the same component type, then JPY wraps it as an instance of a dynamically created Python class with support of iteration and subscription (no slicing) but nothing more.
  2. since the value of the column is a vector, a Python user may naturally (although wrongly) think that it ought to behave like a Python list (or more generally sequence) and use it as such in query strings, e.g. "A = VectorColumn[-1]" or "B = VectorColumn[1: 2]", only to find out that it doesn't work and have to search for equivalent built-in Java functions.

Possible solutions: For issue 1:

For issue 2:

jmao-denver commented 1 year ago

@chipkent @rcaudy @devinrsmith Please comment!

rcaudy commented 1 year ago

I think doing something about issue 2 would be a very nice enhancement.

devinrsmith commented 1 year ago

There may be potential to create small python wrappers around the Vectors that implement the python Sequence / Iterator, and to pass the arguments to python as such. It wouldn't be too dis-similar from how we use JObjectWrappers. There may be hooks that need to be created in jpy to make it easier, but I'm pretty confident we could get this working today. This might not be the best performance though, as each iterating element would be a boundary crossing.

devinrsmith commented 1 year ago

Alternatively, we could imagine other ways of calling into python that either use the buffer protocol, or do a copy into numpy or other array instead.

Edit: this looks like what you are saying in "the engine automatically creates a numpy array during formula evaluation"

niloc132 commented 1 year ago

For at least subsets of case 2, we could decorate our own python functions which merely wrap matching Java methods in such a way that as the python function is queries by the parser, we could identify what that java method would be. Something vaguely like

@j_wrapper(_JDateTimeUtils, "today")
def today(tz: TimeZone) -> str:
    """ Provides the current date string according to the current clock.
    Under most circumstances, this method will return the date according to current system time,
    but during replay simulations, this method can return the date according to replay time.

    Args:
        tz (TimeZone): Time zone to use when determining the date.

    Returns:
        Date string

    Raises:
        DHError
    """
    try:
        return _JDateTimeUtils.today(tz)
    except Exception as e:
        raise DHError(e) from e

I'm not sure if the _JDateTimeUtils reference would be usable from python, but it would be handy to make these easy to write - let the decorator query the .jclass property, and make that fully qualified name and method name available to the parser as it looks at the today() function.

See also https://github.com/deephaven/deephaven-core/issues/3800