facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.42k stars 4.53k forks source link

prior specification #731

Closed bgstn closed 5 years ago

bgstn commented 5 years ago

Hi,

I wonder if there is an option for setting parameters of prophet not just for change point, holiday and seasonality.

image

Here's the story: In a situation where there are predecessors and successors, I can fit prophet with predecessor's data to get parameters of distributions and set them as parameters of prior distribution for successor and update them with successor's data, just like the typical bayesian data analysis.

I browse through the whole document and I can only find prior scale for change point, holiday and regressor and I didn't find any specification for location parameters or parameters for trend component.

Is there any way to specify those in prophet or any other ways to do something similar?

Any help would be greatly appreciated!

Frank

bletham commented 5 years ago

This is an interesting idea - basically using a past model fit as a prior for the current.

You can see that in the Stan model all of the parameters are given priors with 0 mean: https://github.com/facebook/prophet/blob/master/R/inst/stan/prophet.stan#L103

Conceptually, it could be reasonable to replace those 0's with model parameters from a previous fit. For the seasonality this seems reasonable. For the trend it's a bit less clear that we would get the type of behavior we want in the case that the new trend does not follow the old trend closely, where the prior washes out. By default we have 20 trend changepoints in the first 80% of the data. Suppose we set the prior mean for the delta at each of them as the fitted parameter from the earlier time series, but in this case the new time series is very different than the earlier time series. What we would intuitively want is that as we collect data and start to realize that this time series is very different from the earlier time series, we would no longer rely on that prior. What would actually happen is that we would wash out the deltas for the part of the time series that we have observed (we have data there and it is very different from the prior), but the deltas for changepoints that I have not yet observed in the new time series would still have the same prior. I think the desired behavior would be a behavior where the trend prior globally washes out, which we wouldn't get by setting the mean for each delta. Does that make sense?

Practically, this is a bit challenging to try out because the time series are scaled prior to fitting: y is normalized by its maximum value, and ds is converted to a float on [0, 1]. This means that if we are looking at partial data, it will be shorter and so a different scaling will be applied and so the actual parameter values will not be comparable. We'd need to have a way of transferring scaling parameters from one model to another. This is related to #46 , but takes things a step further by not only initializing the parameters from the same point but actually using that point as a prior.

bgstn commented 5 years ago

Thanks for the detailed reply.

There is more information I'd like to add here. In the project I'm involved, which is kind of special but also normal especially for short-lifecycle products in certain industries(or manufactory industry), there will be the expected launch and end date for each generation. Therefore, I know roughly when each generation is supposedly be terminated. If my understanding is right, I think the normalization for time shouldn't be a big problem here.

However, the prior for unseen period and normalization for y may be more series issues. For priors of unseen period, given the predecessors and successors, the better estimate for change points in the near future may be from predecessor at the same stage rather than estimate from historical successor's data, but still taking risk when making prediction.

I look through several issues, like #46 #630. It seems to be an open question till now.

bletham commented 5 years ago

One thing you could try would be to include the previous time series as an extra regressor when fitting the current. Since the prior on the trend is a constant trend (delta = 0), I'd expect this to be similar to setting the previous time series as a prior for the trend. Except this actually does have the desirable behavior described above where if early on the time series diverge, then we will give 0 weight to the extra regressor and the "prior" will wash out.

bgstn commented 5 years ago

Thanks for the great advice!

I didn't get the chance to verify it yet. However, I think it should be a great way to achieve the behavior.

On the other hand, this way raises a minor problem. It seems it cannot give any predictions in the early stage of current generations since training data is not enough, which the "prior" method doesn't have to deal with.

Other than the above, the feature way should give a better performance. I'll try that and see if it works.