fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

Add basic expanding ema #485

Closed MikaelUmaN closed 4 years ago

MikaelUmaN commented 4 years ago

Hi.

I've seen the idea to add a separate package for Deedle.Math: https://github.com/fslaborg/Deedle/pull/475

This includes support for exponential smoothing which is nice.

I would very much like an expanding version of exponential smoothing much like exists in pandas. This is particularly helpful in financial applications when you want to use all history up to the given point in time.

My commit below just follows: // pandas v0.24.2: series.ewm(alpha=0.97, adjust=False, ignore_na=True).mean()

zyzhu commented 4 years ago

Thanks a lot for the contribution. I've noticed my lambda or alpha setting was not consistent with pandas and I've fixed them.

My ewmMean also didn't take care of the missing values well. I've incorporated ideas and test cases from your codes. Check the latest change here. https://github.com/fslaborg/Deedle/pull/475/commits/6821c203a6525ee0dc223b20d3b413bb35cfc770

I also need to fix ewmStdDev and ewmVariance as they are a special approximation case for financial return series assuming mean of returns is zero. My implementation is similar to mftsr package in R. But they are not generalized function for other series that has nonzero mean. https://github.com/rforge/mftsr/blob/master/pkg/R/ewmaVol.R

zyzhu commented 4 years ago

BTW, I've renamed ewmStdDev to ewmVol and moved other RiskMetrics type of functions to separate finance module. I think a separate namespace and related functions benefits a specific domain of finance users. https://github.com/fslaborg/Deedle/pull/475/commits/a2abc29673a947c0defbf555e8b7108991cd26eb

I will try to wrap up matrix dot operations and release a new version soon.

MikaelUmaN commented 4 years ago

Nice, thanks for incorporating it :).

Yes, dealing with nans is probably good. Often you want to look at timeseries of related markets that may not have the same open hours (or if looking at daily prices some markets may be closed due to holiday etc), so inevitably there will be nans.

In pandas there seem to be two options for how that affects the weighting of past values. The simplest one is "ignore_na"=True (my code). If false, they seem to do (1-alpha)**n where n is the number of missing periods. That is also the default setting. But I couldn't quite get it to work in reproducing their numbers.

Should have to do with this code:

if is_observation or (not ignore_na):

                old_wt *= old_wt_factor
                if is_observation:

                    # avoid numerical errors on constant series
                    if weighted_avg != cur:
                        weighted_avg = ((old_wt * weighted_avg) +
                                        (new_wt * cur)) / (old_wt + new_wt)
                    if adjust:
                        old_wt += new_wt
                    else:
                        old_wt = 1.

Have you looked at this setting? I think my prefered default would be for ignore_na to be True but they have reasoned differently starting from v0.15.

MikaelUmaN commented 4 years ago

On another note, I'd be interested in adding some time series functionality such as ARMA-GARCH modeling.

Do you think that would fit and if so where? It would need to rely on probability distributions and some optimization for setting parameters, so probably dependent on e.g. Math.Net. Hard to draw the line of whether to first contribute there or just keep it here... Because it's again related to financial applications my thought would be to keep the probability distributions and optimization methods in math.net and the time series analysis in Deedle ?

zyzhu commented 4 years ago

GARCH model would be great. I've published Deedle.Math today. There is a Finance namespace reserved for cases like GARCH.

In principle, you are right about separating distribution/optimization and actual application. But I wouldn't object building a prototype here first if it takes longer to get pull requests merged into Math.Net repo. Feel free to submit pull requests.

zyzhu commented 4 years ago

I'll take a closer look at how pandas handles its missing values when ignore_na is false and add an optional parameter in ewmMean too