NVIDIA / fsi-samples

A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.
271 stars 115 forks source link

[FEA] Ewm function for finite series #100

Open jumana51 opened 4 years ago

jumana51 commented 4 years ago

Is your feature request related to a problem? Please describe. The pandas EWM function allows passing of adjust as a parameter. For financial time series, the data is not infinite. Therefore, adjust=False should be used. The discrepancy exists on the first few data points only.

For example, for n = 200 rounded to 2 decimal places: using gQuant: 200 107.26 201 107.28 202 107.30 203 107.31 204 107.30 ...
28123 158.06 28124 158.07 28125 158.07 28126 158.07

using pandas adjust=False: 200 107.07 201 107.09 202 107.11 203 107.12 204 107.11 ...
28123 158.06 28124 158.07 28125 158.07 28126 158.07 28127 158.07

Describe the solution you'd like Allow the qQuant version of EWM to accept the adjust parameter and change the calculation to match the formula used by pandas. I am not sure if the recursive calculation for adjust=False will be amenable for GPU.

Describe alternatives you've considered Convert the cudf series to pandas, use the pandas EWM function and convert back to cudf.

Additional context References: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.ewm.html https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#stats-moments-exponentially-weighted

yidong72 commented 4 years ago

@jumana51, I have an issue opened for official EWM function support. Check

https://github.com/rapidsai/cudf/issues/1263

I will take look into this issue and add an adjust flag.

jumana51 commented 4 years ago

@yidong72 Thanks.

A general question: Lots of statistics functions were deprecated in Pandas and moved to statsmodels. I think it was a good strategy to keep the separation between data wrangling in pandas vs. statistical calculations. For cudf / gQuant, should we keep the same separation? Just my 2 cents.

BTW, I wish I had discovered this library a bit earlier as I ended up writing my own functions while porting from pandas to cudf :|

yidong72 commented 4 years ago

gQuant organize the workflow by the Task Nodes where you can implement different statistical calculations. That's how we keep things weakly coupled.

We are currently trying to make a major gQuant release. You can review some of the tutorials at this PR #89. Check this example. https://github.com/yidong72/gQuant/blob/branch-client/notebooks/01_tutorial.ipynb

Hopefully, gQuant is useful to you.