facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.26k stars 4.51k forks source link

How can I efficiently train and evaluate 3000 stock data at the same time use prophet ? #620

Closed gongshaojie12 closed 6 years ago

gongshaojie12 commented 6 years ago

Now,I have 3000 stock data to predict use prophet,How can I efficiently train and evaluate 3000 stock data at the same time use prophet and get the best model? Thanks!

gongshaojie12 commented 6 years ago

Now ,I have 3000 stock data to train,I can't manually predict 3000 models. Is there a way to train and evaluate at the same time?

APramov commented 6 years ago

If you are thinking of modelling them as univariate series (it is my understanding that prophet works only with univariate series), then you will have to do that in a loop (and versions thereof) - I am sure you can easily parallelize as your models' output would not rely on the output of others.

Prophet has (time series) cross-validation, which, to my understanding, is essentially the usual walk-forward forecast performance evaluation wrapped in a convenient function ( I am still playing around with that function, but that's what I have gathered from it so far).

Take a look at the prophet::cross_validation and subsequently the prophet::performance_metrics functions in the prophet package also take a look at one general setup for evaluating out of sample performance of forecast by Prof. Hyndman here

EDIT: In terms of how to set the model up - if you model the price, I would use a linear trend with changepoints (for financial data it seems more reasonable than the nonlinear, saturating growth model). Seasonality really depends on the granularity of the data that you have and whether it is even appropriate to assume for your data.

gongshaojie12 commented 6 years ago

Thanks a lot .Now I use prophet in spark ,but the speed is still very slow,I want to increase the speed of each training.There are 30,000 pieces of data for each training sample.How can I do it? @APramov @bletham

APramov commented 6 years ago

How slow is it per one iteration then?

I wouldn't know technical specifics of how to speed up the code to be honest, but intuitively I would say that if you make the problem easier, i.e. less parameters to estimate, it will be faster (e.g. remove certain types of seasonality that you do not need - if you have 30000 data samples per each stock (series), I am going to go ahead and assume that you have intraday data - in that case you could possibly turn off estimation of yearly seasonality for example. You could also estimate a smaller, fixed number of changepoints etc.)

If you can live without the full inference output, turn off the mcmc.samples option, that takes a lot of time of course.

gongshaojie12 commented 6 years ago

The model training takes 3 minutes at a time,I have set mcmc_samples=0,the below is my params: changepoint_prior_scale = 0.05 weekly_seasonality = False daily_seasonality = False monthly_seasonality = True yearly_seasonality = True changepoints = None seasonality_mode = 'additive' seasonality_prior_scale = 10.0 mcmc_samples = 0 interval_width = 0.80 How can I control the training time within 10 seconds? Because I have 10,000 models to train。my freq params is 5min

APramov commented 6 years ago

So you have 5-min (stock price) data? If that is the case, I am not sure how much yearly seasonality would help, maybe change it to daily or leave seasonality out altogether and go with the trend only (but intraday the seasonal patterns are there and are interesting to look at). Coming back to your question - I don't know how you can optimize the speed beyond what you have done already, by further tweaking the arguments of the prophetfunction, sorry.

gongshaojie12 commented 6 years ago

Thanks a lot @APramov

bletham commented 6 years ago

The most direct lever to reduce training time would be to reduce the amount of input data, e.g. by subsampling the data. You can reduce prediction time by reduce the uncertainty.samples parameter. Making that 100 would make predictions 10x faster and would not affect the main prediction yhat (it would just make the yhat_lower and yhat_upper a bit more variable).

gongshaojie12 commented 6 years ago

Thanks a lot @bletham