facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.32k stars 4.52k forks source link

It is very time consuming for fbprophet to run in win 7 x64? #122

Closed frenet closed 4 years ago

frenet commented 7 years ago

It is very time consuming for fbprophet to run in win 7 x64? It took me about 3 hours, but I still cannot predict by use of fbprophet in python 3.5.2, pystan 2.14.0.0.

import pandas as pd

import numpy as np from fbprophet import Prophet %matplotlib inline holidays=pd.DataFrame({ 'holiday':'double11', 'ds':pd.to_datetime(['2009-11-11','2010-11-11','2011-11-11', '2012-11-11','2013-11-11','2014-11-11','2015-11-11','2016-11-11'])}) m=Prophet(holidays=holidays, holidays_prior_scale=0.2).fit(df) forecast=m.predict(future)

fbprophet

frenet commented 7 years ago

ps: I've changed all the n_jobs in pystan to 1.

bletham commented 7 years ago

Could you try running each of these commands in separate blocks to see at which point it is getting stuck?

m = Prophet()
m.fit(df)
forecast = m.predict(future)
frenet commented 7 years ago

it got stuck at the final sentence. predict

bletham commented 7 years ago

Could you try removing the last block, with mcmc_samples=500, and check if it runs? My experience with Jupyter has been that sometimes it gets delayed in displaying a block as being finished, so it is possible that the m.predict(future) could actually be working and then it is getting stuck on the MCMC sampling in the next block.

The MCMC sampling could be slow, and could be especially slow in Windows where we had to remove matrix multiplication from the Stan code to get it to compile. Try running it with just 10 samples to see if it at least finishes. A compounding factor is that I believe Stan cannot run chains in parallel in Windows; by default it runs 4 chains, which would then be run sequentially. You can pass in arguments to pystan.Stanmodel.sampling to fit, such as chains=1.

frenet commented 7 years ago

Thank you for your help, almost everything happened in the way of your experience, and I finally got the results in a short time. But I still like to know: how can I got the results in the condition that mcmc_samples=500?

bletham commented 7 years ago

I'm not very optimistic about this being fast in Python in Windows anytime soon. There are two issues, both upstream issues in PyStan. The first is that PyStan in Windows (unlike in Linux or OSX or RStan) cannot use multiple cores to run the chains in parallel. This means that on a 4-core machine, PyStan in Windows will immediately be ~4x slower than PyStan in Linux or OSX, or RStan. On top of that, Prophet's Stan model uses matrix multiplication to speed things up. Matrix multiplication does not work in PyStan in Windows (https://github.com/stan-dev/pystan/issues/308) which seems to be a difficult issue. So in the meantime, when Prophet is running in Python in Windows it uses a different model file which does not use matrix multiplication. I expect this to make it at least 2x slower (already on top of the 4x slower from not using multiple cores).

I will add a note to the documentation explaining that this will be slow in Python in Windows. In the meantime if you want to do MCMC in a reasonable amount of time, the only options are Linux (e.g. virtualbox) or R.

hassan-sabirin commented 7 years ago

Hi. I am new to fbprophet. I had a conversation with ahartikainen and he suggested I post something here.

First, I would like to clarify that it is not true that parallel processing doesn't work on Windows. However, the python code must be written properly as suggested here: https://docs.python.org/2/library/multiprocessing.html http://stackoverflow.com/questions/20222534/python-multiprocessing-on-windows-if-name-main

The best practice is to write code like this:


def run():
    # your pystan/fbprophet code here
    pass

if __name__ == '__main__':
    run()

Secondly, it is possible to build and run pystan on Python 3.6 64-bit and mingw-64. You will not encounter any problems with matrix multiplication / vector addition operations.

bletham commented 7 years ago

Thanks @hassan-sabirin. Certainly multiprocessing works generally in Windows, but we rely on Stan to do the sampling. My understanding from the Stan documentation is that it doesn't work. See the line here "PyStan on Windows cannot use multiple processors in parallel.": http://pystan.readthedocs.io/en/latest/windows.html

The sampling would be happening in C++ code and so I think would be unrelated to multiprocessing.

As for matrix multiplication, this was identified in #2 as being an issue for a number of users. If setting up a different compiler makes this work then maybe we could have an optional argument to setup which specifies that we want to use the Unix stan file (with matrix multiplication) and not the Windows one. This would have to be a PR though.

hassan-sabirin commented 7 years ago

I haven't looked much into the numbers produced using parallel jobs under Windows. However, without protecting the code under main, with n_jobs > 1, pystan under Windows will start creating new processes indefinitely. I suspect this is the same behavior when running code under IPython and Jupyter.

If you can specify a (quick) pystan test case, I can run it under Windows with 4 jobs and pass back the results to see if they are similar to results under Linux. I have compared the 8 school result and they look OK to me.

On the left is pystan compiled with mingw64 + Python 3.6 64-bit running 8 schools with n_jobs=4. Middle is the resulting plot. On the right is a benchmark I found somewhere from the internet.

img_20170423_211501 Sorry about the phone screenshot. I shared it over the phone first, then decided to post here. I can run it again and have a better screenshot later if needed.

frenet commented 7 years ago

@bletham Your understanding "My understanding from the Stan documentation is that it doesn't work. " is felt by myself, too. For I'm struggling with pystan for a whole day to install and run the example shipped with pystan. It is not user friendly for pystal indeed. Python 3.6 in windows 7 is not supported for there is not corresponding compiled .whl file of pystan, I firstly installed python 3.6 and then I switched to python 3.5, but on the main page of pystan, it is said python python 3.5 or higher must be used(http://pystan.readthedocs.io/en/latest/windows.html).

hassan-sabirin commented 7 years ago

@bletham Regarding matrix multiplication issue, I believe this is a msvc bug, not a stan/Eigen problem. It is possible to find a workaround such as rewriting the template in a different way. However, I think this should be part of stan-dev. The solution from fbprophet should be to provide a binary wheel of a build using mingw, which I think what conda/anaconda actually provides. I have a wheel here for Py3.6 64-bit + Mingw64: https:drive.google.com/drive/u/1/folders/0B-PWaLZ40SDQTjFBV3lEaGgyV0U pystan 64-bit py2.7 & 3.6. No warranty offered. Might have to install Mingw64 and set PATH to %MINGW64%/bin.

bletham commented 7 years ago

Wow, that's a great analysis. So if conda provides this it sounds like the best solution would be to encourage Windows users to install fbprophet via conda?

hassan-sabirin commented 7 years ago

I didn't fully test it, but it's available: https://anaconda.org/conda-forge/fbprophet

conda install -c conda-forge fbprophet

bletham commented 4 years ago

From what I can tell this is resolved by installing from conda, which is our official recommendation, so I'm going to close this.