Closed miguelangelnieto closed 4 years ago
@miguelangelnieto I'm going to file this as an enhancement but add it as a wishlist milestone. Do you have a motivating use case? I'm not sure our fitting procedure is actually capable of incremental updates, so actually what we'd end up doing is re-fitting the whole model each time (maybe with the previous starting parameter values).
Let's say that you fit a year of data to the model and you have a model ready to predict.
The idea is to have the possibility of keep adding new data to that model in small batches. For example, read metrics from prometheus and add new data from the last hour. Extracting lot of metric points from some services like prometheus is pretty slow and time consuming, so after the first large fit, would be nice to keep fitting new data but in small batches.
Some sklearn models have the partial_fit possibility, for example Gaussian Naive Bayes.
The online learning used by some sklearn models is pretty fundamentally different from how Stan models are fit, I don't think we are going to have a partial_fit like that in the future. Like @seanjtaylor said we could warm-start the fit, which should make things very fast if only a few data are added. @seanjtaylor if it sounds good to you we can put it on the v0.2 list.
@bletham Dustin Tran figured out a hack to do minibatch ADVI in Stan, which could work. A few people are working on adding streaming variational Bayes to Stan 3, along the lines of Pymc3's inference engine.
Fascinating, that's an impressive hack. We anyway should add an interface to Stan's variational inference engine, I haven't yet tried it out in this setting to see if it gives something reasonable.
@bletham You also might be able to pass in the distributions of the previous fit as priors. You could then fit on either a small window of data, or just the new datapoints, since the priors are now informative. A few scikit-learn estimators implement this as the "warm_start" parameter. Not sure how fast this would be though, since pystan (as far as I know) has to recompile the model.
Edit: Following Dustin's comment on this thread, you'd also need to figure out how to scale the likelihood if you used the minibatch interface.
For starters, I think we can make a function that takes a model, creates a new model with the same input args/ seasonalities / regressors, and then fits that model but with the initial conditions taken as the parameters of the input model (rather than the current defaults in https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1018). This would be pretty straightforward, and should be much faster than fitting totally from scratch if the data have not changed much.
The only potential challenge is that if we have added a lot more data, the trend changepoints will be in totally different places in the new time series and the old trend changepoint values would be pretty useless. So we'd want to do something where we initialize from old trend changepoint values if the changepoints are close in time, but if they're really far we just fall back to the current default initialization of 0.
For starters, I think we can make a function that takes a model, creates a new model with the same input args/ seasonalities / regressors, and then fits that model but with the initial conditions taken as the parameters of the input model (rather than the current defaults in https://github.com/facebook/prophet/blob/master/python/fbprophet/forecaster.py#L1018). This would be pretty straightforward, and should be much faster than fitting totally from scratch if the data have not changed much.
@bletham I don't use custom changepoints , only have timeseries with some exogenous regressors. Does it mean I can use the method your mentioned ?
The case is https://github.com/facebook/prophet/issues/882 : I want to scroll train and predict data, to check how good prophet perform on my dataset .
@eromoe I'll give some thoughts on that specific use case at #882, but for the more general question, when custom changepoints are not specified, then they are just placed uniformly through the history. The challenge that that creates here is that adding more data will mean we should have more and/or differently placed changepoints.
This is great information! Looking at the way stan's optimizing function is being called, and according to @bletham 's Nov 2nd, 2018 comment above, looks like we can pass previously-trained-model params by doing this:
p = Prophet(**kwargs)
p.fit(df)
p2 = Prophet(**kwargs)
p2.fit(pd.concat([df, additional_week_of_daily_data_df]), init=p.params)
and init
in p2.fit should constructively be passed to stan's optimizing function call. Does that make sense @bletham or am I missing something? I'm assuming this should work even if we have extra regressors and custom seasonalities?
Edit: When I try the above, I get the below error... There must be something I'm missing.
mismatch in number dimensions declared and found in context; processing stage=initialization; variable name=k; dims declared=(); dims found=(1,1) WARNING:fbprophet:Optimization terminated abnormally. Falling back to Newton. mismatch in number dimensions declared and found in context; processing stage=initialization; variable name=k; dims declared=(); dims found=(1,1)
That's most of what's involved but there is one additional detail. If you compare p.params
to the default initailization here: https://github.com/facebook/prophet/blob/480b67b8fed05381b910bce6e7a64efba9f0d838/python/fbprophet/forecaster.py#L1091
you'll see that in p.params
there are nested arrays. This is so that downstream we have consistent shapes for params whether we did MAP fitting or MCMC. Basically you just need to extract things from the arrays for it to work. This works:
m = Prophet()
m.fit(df)
m2 = Prophet()
def stan_init2():
res = {}
for pname in ['k', 'm', 'sigma_obs']:
res[pname] = m.params[pname][0][0]
for pname in ['delta', 'beta']:
res[pname] = m.params[pname][0]
return res
m2.fit(df, init=stan_init2)
(note that in the documentation for pystan.StanModel.optimizing
it says that init
should be a callable that returns a dictionary, as done here, but directly supplying the dictionary seems to work too).
I will repeat my caveat from above:
delta
will be the wrong size.Perfect, code works like a charm. Thanks for the details and re-listing the caveats @bletham. Using timeit magic, here is a comparison of the time needed to fit prophet on two datasets that have 7 datapoints difference (daily data, so m2 has 1 additional week of data):
Model 1 fit: 98.4 ms ± 7.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Model 2 fit (using
stan_fit2
by Ben above): 55.9 ms ± 546 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
That's over 40% faster!
On a separate note, predict
takes significantly longer than fit
. I tried lowering the number of uncertainty_samples down to 20 and even 0 but the time to predict didn't budge. I can take this discussion to a different thread if needed.
Model 1 predict: 1.33 s ± 9.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
@sammourad you need completely remove the code of uncertain samples calculation . there is a pr about it.
@sammourad you need completely remove the code of uncertain samples calculation . there is a pr about it.
I see the PR @eromoe ... Thank you for the help!
That's most of what's involved but there is one additional detail. If you compare
p.params
to the default initailization here:you'll see that in
p.params
there are nested arrays. This is so that downstream we have consistent shapes for params whether we did MAP fitting or MCMC. Basically you just need to extract things from the arrays for it to work. This works:m = Prophet() m.fit(df) m2 = Prophet() def stan_init2(): res = {} for pname in ['k', 'm', 'sigma_obs']: res[pname] = m.params[pname][0][0] for pname in ['delta', 'beta']: res[pname] = m.params[pname][0] return res m2.fit(df, init=stan_init2)
(note that in the documentation for
pystan.StanModel.optimizing
it says thatinit
should be a callable that returns a dictionary, as done here, but directly supplying the dictionary seems to work too).I will repeat my caveat from above:
- If the number of changepoints changes from one model to the next, this will error because
delta
will be the wrong size.- If the locations of the changepoints in time have changed greatly, this may do worse than the default initialization because the initial trend may be very bad.
Hi @bletham I have implemented based on your example, but the time processing of 2 models is the same, not improve so much as @sammo case, here is my script:
### Period-training
start_time = time.time()
df = pd.read_csv('revenue_old.csv')
m = Prophet(interval_width=0.95, changepoint_range=0.9, changepoint_prior_scale=200, daily_seasonality=True, yearly_seasonality=True, weekly_seasonality = True, seasonality_mode='multiplicative', n_changepoints=200)
m.fit(df)
end_time = time.time()
spent_time = end_time - start_time
print("Time spent model 1: ", spent_time)
future = m.make_future_dataframe(periods=60)
# future['bitcoin_price'] = df['bitcoin_price']cd ../
# future['litecoin_price'] = df['litecoin_price']
future = future.fillna(0)
forecast = m.predict(future)
fig1 = m.plot(forecast)
start_time = time.time()
df = pd.read_csv('revenue_new.csv')
m2 = Prophet(interval_width=0.95, changepoint_range=0.9, changepoint_prior_scale=200, daily_seasonality=True, yearly_seasonality=True, weekly_seasonality = True, seasonality_mode='multiplicative', n_changepoints=200)
def stan_init2():
res = {}
for pname in ['k', 'm', 'sigma_obs']:
res[pname] = m.params[pname][0][0]
for pname in ['delta', 'beta']:
res[pname] = m.params[pname][0]
return res
m2.fit(df, init=stan_init2)
end_time = time.time()
spent_time = end_time - start_time
print("Time spent model 2: ", spent_time)
future = m2.make_future_dataframe(periods=60)
future = future.fillna(0)
forecast = m2.predict(future)
fig2 = m2.plot(forecast)
Here is the time-processing:
Time spent model 1: 34.34286665916443 Time spent model 2: 33.433412313461304
Please let me know where I was wrong. Thanks a lot.
@bigredbug47 the code seems fine to me. If the dataframes are sufficiently different there may not be much benefit to warm-starting, maybe try using the same fit dataframe for both and see if it's then shorter the second time?
@bigredbug47 I only had one additional week of daily data when 'partial training' and the additional values where fairly close to the previous history, as @bletham mentioned.
In order to fit the model with data from other sources, like Prometheus, would be nice to have partial_fit implemented, so new data can be added periodically.