Open chipkent opened 12 months ago
To follow up to Chip's initial comment on this post, here's more motivation to make this happen.
I wrote up code that calculates Aroon indicators for some finance data. Here's the code:
from deephaven.updateby import rolling_min_time, rolling_max_time, rolling_group_time
from deephaven.plot.figure import Figure
from deephaven.time import to_j_instant, to_np_datetime64
from deephaven import read_csv
import numpy.typing as npt
import numpy as np
crypto_table = read_csv("https://media.githubusercontent.com/media/deephaven/examples/main/CryptoCurrencyHistory/CSV/CryptoTrades_20210922.csv")
min_25min = rolling_min_time(ts_col="Timestamp", cols=["Min_25m = Price"], rev_time="PT25m")
max_25min = rolling_max_time(ts_col="Timestamp", cols=["Max_25m = Price"], rev_time="PT25m")
group_25min = rolling_group_time(ts_col="Timestamp", cols=["Prices_25m = Price", "Timestamps_25m = Timestamp"], rev_time="PT25m")
def aroon(prices, times) -> npt.NDArray[np.double]:
prices = np.array(prices)
maxprice_idx = prices.argmax()
minprice_idx = prices.argmin()
tmax = to_np_datetime64(times[maxprice_idx])
tmin = to_np_datetime64(times[minprice_idx])
tlast = to_np_datetime64(times[-1])
tmin_diff = float((tlast - tmin) / np.timedelta64(1, "m"))
tmax_diff = float((tlast - tmax) / np.timedelta64(1, "m"))
return [tmin_diff, tmax_diff]
crypto_updated = crypto_table.update_by(
ops=[min_25min, max_25min, group_25min],
by=["Instrument", "Exchange"]
).update_view(
formulas=["Aroon = (double[])aroon(Prices_25m, Timestamps_25m)", "AroonDown = ((25 - (double)Aroon[0]) / 25) * 100", "AroonUp = ((25 - (double)Aroon[1]) / 25) * 100"]
).drop_columns(
cols=["Aroon", "Timestamps_25m", "Prices_25m", "Min_25m", "Max_25m"]
)
eth_coinbase_aroon = crypto_updated.where(["Instrument == `ETH/USD`", "Exchange == `coinbase-pro`"])
price_plot = Figure().\
plot_xy(series_name="Aroon Up", t=eth_coinbase_aroon, x="Timestamp", y="Price").\
chart_title(title="Price").\
show()
aroon_plot = Figure().\
plot_xy(series_name="Aroon Up", t=eth_coinbase_aroon, x="Timestamp", y="AroonUp").\
plot_xy(series_name="Aroon Down", t=eth_coinbase_aroon, x="Timestamp", y="AroonDown").\
chart_title(title="Aroon Indicators").\
show()
The table I'm doing this on has 1M rows. The code, as-is, uses update_view to do on-demand calculations. That works well to produce the resultant table, but then takes a long time to render the output. If I change it to an update instead, the operation just hangs for 10+ minutes.
The code is slow for some fairly obvious reasons, mostly with having to use a rolling_group with a custom Python function. It would be amazing if I could implement a custom aggregation/UpdateByOperation that could do it more efficiently.
If there's a better way to implement this currently, I'm all ears. There are probably ways to improve the code as-is, but I can't think of a way to make this operation efficient in the current version of Deephaven.
Current
agg_by
,update_by
, andrange_join
aggregations:To deal with these limitations, operations such as
AggFormula
andAggGroup
+update
are suggested to users. These steps can create viable output, but they suffer from limited functionality (e.g. #4194, #4195, #4052). Additionally, these solutions do large recomputations, rather than computations on changes, which makes them inefficient.There should be
agg_by
,update_by
, andrange_join
operations that:agg_by
,update_by
, andrange_join
As an example, let's consider a weighted absolute sum. Pseudocode may look something like: