facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.57k stars 4.54k forks source link

Pymc3 Version and Prediction Question #636

Closed luke14free closed 6 years ago

luke14free commented 6 years ago

Hi there, I am working on a pymc3 version of your model ( https://github.com/luke14free/pm-prophet ). I am doing this mostly for fun and it's still far from being usable/complete; yet I saw a lot of value in allowing analysts to specify their own priors for the regressors/changepoints/holidays so I thought that pymc was a good choice given that I don't care about supporting R.

First thing I wanted to say is kudos for your work (both in terms of modelling and coding) I really like prophet and the idea behind it 🙏!

Anyways, given this idea of flexibility about the priors I was wondering about something I can't make much sense of in your prediction phase:

        # New changepoints from a Poisson process with rate S on [1, T]
        if T > 1:
            S = len(self.changepoints_t)
            n_changes = np.random.poisson(S * (T - 1))
        else:
            n_changes = 0
        if n_changes > 0:
            changepoint_ts_new = 1 + np.random.rand(n_changes) * (T - 1)
            changepoint_ts_new.sort()
        else:
            changepoint_ts_new = []

        # Get the empirical scale of the deltas, plus epsilon to avoid NaNs.
        lambda_ = np.mean(np.abs(deltas)) + 1e-8

        # Sample deltas
        deltas_new = np.random.laplace(0, lambda_, n_changes)

        # Prepend the times and deltas from the history
        changepoint_ts = np.concatenate((self.changepoints_t,
                                         changepoint_ts_new))
        deltas = np.concatenate((deltas, deltas_new))

If I understand what this code does, you simulate some new changepoints in the future and create some random values for those changepoints following the fitted parameters of the laplacian for the changepoints. And my question is.. why do you do this? My thoughts:

1) Changepoints are mostly specified by the user for past shocks (new product launches, news, whatever) so I wouldn't want to have them re-used in predictions (there is no evidence that they might occur again in the future.. or at least the analysts should be able to specify that) 2) You are in any case using a mean of the previous changepoints to specify the scale parameter of the laplacian which just doesn't feel right to me (note: I am no statistician) as I see some heavy non linearity in the way the laplacian works. An example: you have one trend changepoint of 5 because of a product change, and then 9 other 0 changepoints. The average is 0.5 and you would be sampling from a Laplacian(0, 0.5) whose variance is 2*b*2 = 0.5 (and mu: 0 so there should be no extra bias).. but* (a) as an analyst I don't see any reason in you sampling new changepoints for prediction close to 0.5 when I only had a single shock in my past data (b) given the non-linearity if you wanted to do this it could be better to randomly choose a past changepoint and apply it (or use the median/mode?).

My overall feeling is that this process could just add noise to the predictions (as I see this potential time series variance increase).

Sorry for the long message and thanks in advance! 😅

bletham commented 6 years ago

These are good questions.

This code/procedure is for estimating uncertainty in the predictions. In the Prophet model, the trend is modeled as being piecewise linear (or piecewise logistic if so specified). A major source of uncertainty in the future prediction is the possibility that there will be future trend changes. We can't really say in general what the distribution of future trend changes will be, and so Prophet does something that I think is about the best thing we could do: Assume that the future will see the same rate and magnitude of changepoints as the past. This won't always be correct (actually it will probably never be entirely correct), but we have to assume something and I think that's about as reasonable as can be done. With that introduction, here are your specific questions:

  1. Changepoints could be manually specified to mark specific past shocks. But by default we just include a large number of changepoints at evenly spaced points to be able to fit whatever trend changes there may have been in the past. If I know that there were shocks in the past that won't occur in the future, then this procedure would overestimate the future uncertainty. Likewise, maybe the trend didn't change at all in the past but I know it will in the future. I would then be underestimating the future uncertainty. Right now there's no way to insert this sort of prior; there's an issue open for it in #293.

  2. The Prophet model is that the magnitude of the change at each of the historical changepoints is delta_i ~ Laplace(tau) where tau is the changepoint_prior_scale that you can set when you create the model. We estimate each delta_i during model fitting. Our goal now is to sample similar delta's to use in this simulated future changepoints and we do that with an empirical Bayes approach, where we use the data to specify the prior rather than using a true prior (like tau). In this case, if delta_i ~ Laplace(lambda), then the average of |delta_i| is the maximum likelihood estimate for lambda. Empirical Bayes generally (and this approach in particular) is an approximation to a hierarchical model, which is what would be the most correct thing to do here but would be difficult to infer without requiring MCMC for model fitting. The alternative you describe, of selecting future changepoints from the distribution of past changepoints, is reasonable and would be the non-parametric version of what is done here parametrically with the Laplace distribution. Being parametric is advantageous when the model is good, and in this case since we have already assumed a Laplace distribution to infer delta_i in the first place it seems reasonable.

One thing to note also is that this is not used for the main forecast yhat at all, it's only used for the uncertainty interval.

This is described in a bit more detail in Section 3.1.4 of the paper here: https://peerj.com/preprints/3190.pdf .

luke14free commented 6 years ago

thanks for the directions. makes sense. closing the issue