deephaven / deephaven-core

Deephaven Community Core
Other
255 stars 81 forks source link

Potential timestamp math confusion, python #2777

Open devinrsmith opened 2 years ago

devinrsmith commented 2 years ago
from deephaven import time_table

t1 = time_table("00:00:01").update_view("bin=upperBin(Timestamp, 5 * SECOND)")

from deephaven.time import SECOND

t2 = time_table("00:00:01").update_view("bin=upperBin(Timestamp, 5 * SECOND)")

t1 is likely what the user intended; t2 results in an integer overflow w/ 5 * SECOND due the import.

I'm not sure what the best solution is.

jmao-denver commented 2 years ago

In a generalized sense, this is related to #2306. i.e.

  1. formula expression parser should not allow Python symbols overshadow Java ones
  2. need a formal formula expression spec that documents all the supported built-in Java constants so as to make it less confusing for the users
  3. a transparent mapping of wrapped Python names (function, methods, constants, enums) to the Java ones and make the formula expression parser be aware of this mapping during expression compilation and code-generation.
jmao-denver commented 1 year ago

In a generalized sense, this is related to #2306. i.e.

  1. formula expression parser should not allow Python symbols overshadow Java ones
  2. need a formal formula expression spec that documents all the supported built-in Java constants so as to make it less confusing for the users
  3. a transparent mapping of wrapped Python names (function, methods, constants, enums) to the Java ones and make the formula expression parser be aware of this mapping during expression compilation and code-generation.

I have dug a little deeper and this is what I found. Because of the import, SECOND is resolved as a Python value and its type is inferred as Java integer, this in turn causes the QLP to generate code for the multiplication (5 * SECOND) to match the overload utility method public static int multiply(int a, int b). The solution we are considering for #2306 (attaching some meta info to the Python wrapper) won't solve this issue. @devinrsmith @rcaudy @chipkent Maybe we could safely change this method to return long instead of int to avoid the overflow?

rcaudy commented 1 year ago

Or maybe we could get rid of SECOND in Python? Or tell users not to to do this?

devinrsmith commented 1 year ago

Additionally, this causes additional boundary crossing. There are other similar cases (example from JJ):


from deephaven import empty_table, time_table, merge
from deephaven import constants as dhconst
import random

null_int = dhconst.NULL_INT
null_double = dhconst.NULL_DOUBLE

def randomletter() -> str:
    return random.choice(["A", "B", "C", "D"])

static_table = empty_table(100).update(["isin = randomInt(0, 10)", "source = randomInt(0, 10)"])

live_table = time_table("PT1S").update(["isin = randomInt(0, 10)", "source = randomInt(0, 10)", "key = randomletter()", "bid = randomDouble(50, 100)", "ask = randomDouble(50, 100)", "mid = bid + ask / 2"])

static_table = static_table.update_view(["key = `Q`", "bid = null_double", "ask = null_double", "mid = null_double"])

trigger = time_table("PT10S")

resultant_table = merge([static_table, live_table.drop_columns(["Timestamp"])]).snapshot_when(trigger_table=trigger)
chipkent commented 1 year ago

4388 Should resolve the original example because the SECOND symbol no longer exists ... but similar problems could still happen.

chipkent commented 4 months ago

I did a bit more digging on this item.

  1. 4388 solved the original problem of time constants like SECOND being in python by removing them.

  2. There are still constants in deephaven.constants. These include the null constants and max/min constant values.

How the constants are interpreted can be seen in this simple query.

import deephaven.constants as dhc
from deephaven import empty_table

cons = [c for c in dir(dhc) if not c.startswith('_') and c != "jpy"]

for c in cons:
    globals()[f"VAL_{c}"] = getattr(dhc,c)

t = empty_table(10).update([f"COL_{c}=VAL_{c}" for c in cons])

m = t.meta_table

Note:

Probably the only way to get around these problems is for the query to use the Java built-in values instead of Python values for a list of values.