deephaven / deephaven-core

Deephaven Community Core
Other
257 stars 80 forks source link

Is there a pd.DataFrame.rolling(<window>).apply(<some_func>) equivalent? #1796

Open kzk2000 opened 2 years ago

kzk2000 commented 2 years ago

As heavy pandas user, and after searching the docs for it, I'm still having trouble of doing something simple as below using Deephaven's dynamic tables. Is there a native way to apply arbitrary Python funcs as a rolling window? NOTE: this is not an EMA that can be updated with just the last tick.

Thanks for any pointers in advance!

import pandas as pd
import numpy as np

pd.set_option('display.max_columns', 200)

df = pd.DataFrame(np.random.randn(10,1), columns=list('X'))
df['rolling_mean'] = df['X'].rolling(3).mean()
df['rolling_median'] = df['X'].rolling(3).median()
df['rolling_sum'] = df['X'].rolling(3).sum()
df['rolling_sum_with_apply'] = df['X'].rolling(3).apply(lambda x: x.sum())
print(df)

yields

          X  rolling_mean  rolling_median  rolling_sum  rolling_sum_with_apply
0 -1.068977           NaN             NaN          NaN                     NaN
1 -1.531186           NaN             NaN          NaN                     NaN
2  0.642410     -0.652585       -1.068977    -1.957754               -1.957754
3 -0.358194     -0.415657       -0.358194    -1.246970               -1.246970
4 -0.360693     -0.025493       -0.358194    -0.076478               -0.076478
5 -1.255688     -0.658192       -0.360693    -1.974575               -1.974575
6 -1.308981     -0.975121       -1.255688    -2.925363               -2.925363
7  0.427160     -0.712503       -1.255688    -2.137510               -2.137510
8 -1.123338     -0.668386       -1.123338    -2.005159               -2.005159
9  0.373535     -0.107548        0.373535    -0.322643               -0.322643
hythloda commented 2 years ago

Here is one way to do something similar. We currently do not have a way to supply a rolling window to an aggregator so I am going to leave this issue open. In the mean time, below explains my thinking around your issue.

I made a new table that is the np random values like you provided in your example.

Then a function my_function that can do whatever calculation needed. This just sums array elements but can be a function of any complexity. Just use any python function notation you like.

Since you wanted to add to the original table I have the meat as one line. The _ will access the array elements of the column and .subVector in this case takes 3 values for the calculations.

We have built in support for mean(avg) , median and sum.

Since the first two rows are NAN, just show values that makes sense by applying a where filter out the first two rows.
Then drop those from the table since they clutter the screen.

import numpy as np
from deephaven.TableTools import newTable, doubleCol

def my_function(A):
    total = 0
    for i in range(0, len(A)):
        total = total + A[i]
    return total

dh_table = newTable(\
    doubleCol("X", np.random.randn(10,1)))\
    .update("A= X_.subVector(i-2,i+1)",\
        "rolling_mean = avg(A)", \
        "rolling_median =median(A)", \
        "rolling_sum = sum(A)", \
        "FuncX=my_function(A)")\
    .where("X_[i-2]>=-10").dropColumns("A")