alkaline-ml / pmdarima

A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
https://www.alkaline-ml.com/pmdarima
MIT License
1.58k stars 234 forks source link

support holes in the series with fb_prophet style fourier terms #280

Open ihadanny opened 4 years ago

ihadanny commented 4 years ago

this is pretty sketchy, but I would like to fit the workdays and the non-workdays (weekends + holidays) separately (I know that ARIMA won't like the discontinuities, but I'd like to try it anyways)

this is very difficult to do with the current FourierTransformer, as it assumes that the series is continuous, so I tried the following hack - I'm letting the Featurizer know the date thru the exogenous index, and using Prophet for creating the correct fourier terms for both fit and predict:

class MyFourierFeaturizer(ppc.FourierFeaturizer):
    def transform(self, y, exogenous=None, n_periods=0, **_):
        _, exog = self._check_y_exog(y, exogenous, null_allowed=True)
        X_fourier = Prophet.fourier_series(exogenous.reset_index()['ds'], self.m, self.k)
        exog = self._safe_hstack(exog, X_fourier)
        # safe_hstack ruins the index :(
        exog.index = exogenous.index
        return y, exog

What do you say? Is this the way to go or am I missing something much easier? Or am I messing everything by trying this? :)

tgsmith61591 commented 4 years ago

Hmm this is pretty interesting. Could you by chance share a data snippet for me to tinker with? I like the idea of this a lot

ihadanny commented 4 years ago

cool! the dataset I'm working with is an hourly aggregation of https://data.cityofnewyork.us/Transportation/2017-Green-Taxi-Trip-Data/5gj9-2kzx and I can provide you with a snippet, but I think the best would be to demo this on one of your standard datasets:

taylor = load_taylor(True)
taylor.index = pd.date_range('2000-06-05', '2000-08-28', freq='30T')[:-1]
taylor.index.name = 'ds'
taylor_workdays = taylor[taylor.index.dayofweek < 5]

what do you say?