Open orenmatar opened 3 years ago
@orenmatar and @tcuongd do you know if anyone has made an attempt on this yet? Otherwise I might take a try at integrating that code.
@winedarksea I don't know of any attempt... and would love to see one. If you have any questions about the code - I'd be happy to help. BTW, I'm working on optimizing the fit now. It can be done without using pystan, which consumes a lot of time.
@orenmatar thanks, hoping to get this done in the next week or two (perhaps a foolish hope). First have to figure out the new cmdstanpy backend for the develop
some questions for you @orenmatar
Prophet(mcmc_samples=300)
.
elif prophet_obj.growth == "flat":
sample_trends = np.zeros(k, len(forecast_df))
I also played around with this variation based on the flat_trend function but it is wrong, being not centered around 0
elif prophet_obj.growth == "flat":
sample_trends = np.repeat(
prophet_obj.params['m'] * np.ones_like(future_time_series),
k, axis=0
)
@winedarksea
Good luck! looking forward to seeing this implemented!
I've been away on vacation in Italy, but I'm back!
I have a question which really is for the Prophet maintainers if they will come to the party? I am FB Internal and from what I can tell Ben Letham is entirely moved on to other projects - for example in Workspaces for Prophet he ignored call outs to him from others when there was a bug in the internal build. So request to @dmitryvinn and @tcuongd as to whom to coordinate work on?
My basic question is how many internal functions supporting the prediction need to be maintained. The simplest way to integrate these speed ups would be to cut the number of functions - because most of them are nested in for loops that will no longer be needed.
Would be obsolete sample_posterior_predictive, sample_model, sample_predictive_trend
Alternatives: Could maintain them unchanged, but remove them from use in predict_uncertainty. Or could attempt to rewrite to access from new method. sample_posterior_predictive would be fairly easy to update to the new method, the others would be harder.
A possible compromise solution would be to use the old way when mcmc_samples is True, which users are already expecting to be slow, and use the new way (with a trimmed down number of functions supporting it) for other cases.
@winedarksea How is it going with the project? BTW, I just published another post on optimizing Prophet, which you may find interesting... https://medium.com/towards-data-science/need-for-speed-optimizing-facebook-prophet-fit-method-to-run-20x-faster-166cd258f456
@orenmatar I messaged Ben Letham internally and his current priority is updating the internal platforming for Prophet at Meta which I may be helping with. But he did look at your work and liked it, so I think we will get this in - although that's a matter of months probably.
Neural Prophet (which is from a different team) uses PyTorch as well, and I think having a PyTorch alternative backend would be great since many people already have some familiarity with it. Altlhough 60 ms is hardly slow to begin with!
@winedarksea I think 60ms is slow compared to most linear regressions which is what Prophet ultimately is. some datasets really do contain hundreds of thousands of items if not millions, so these gains could be meaningful. but for sure it is more important to optimize the predict method... Thanks for the update!
@orenmatar do you have a fork where you implemented the changes? Thanks!
@nicolaerosia I don't... I only implemented it as a side function to be called separately and @winedarksea started working on a fork. Not too difficult to replace the existing confidence interval with the new one. I'll be happy to help if there's a real chance it will be incorporated into the main package.
@orenmatar I have hopes for inclusion, it seems like @tcuongd is active maintaining! @winedarksea could you please share your branch? I see something here, https://github.com/winedarksea/prophet/tree/vectorization but it is not related
@nicolaerosia sure thing. So right now, I am working with the idea that the "old way" (unmodified) will be run whenever mcmc_sampling=True. The reason for this is that our 'new' function will lose some of the inner functional calls (buried in slow for loops) that are not hidden and are possibly maybe being used by people and that we don't want to break. See my post above. Basically I still have unresolved feelings on what the best way to change the code base is, since we don't want to break anything people may be depending on
@winedarksea I think it's a good idea, do you have a branch I can take a look at? Thanks a lot!
sure, I need to publish my changes, I'll try to do that tomorrow
I was working on scaling up Prophet to be used as a forecast engine for hundreds of thousands of items for a project I'm working on, and realized that predicting with Prophet takes the vast majority of the running time, After some research I figured out a way to speed it up significantly, and wrote the following post:
https://towardsdatascience.com/how-to-run-facebook-prophet-predict-x100-faster-cce0282ca77d#4b11-31bf17c6772
I think it would be great if something like this was implemented.