AutoViML / Auto_TS

Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Created by Ram Seshadri. Collaborators welcome.
Apache License 2.0
718 stars 111 forks source link

Variation in SARIMAX and Prophet forecast dates #19

Closed raghurajpandya closed 3 years ago

raghurajpandya commented 4 years ago

When SARIMAX is the best output and the forecast period =12... it forecasts [defaultdict output] for 12 periods in the train data instead of 12 forward periods while Prophet includes 12 forward periods.

Best Model is: SARIMAX Best Model Forecasts: net_revenue mean mean_se mean_ci_lower mean_ci_upper 2018-07-31 3.080343 0.366081 2.362838 3.797848 2018-08-31 2.808159 0.366107 2.090602 3.525715 2018-09-30 2.893071 0.367888 2.172024 3.614118 2018-10-31 3.178024 0.375812 2.441446 3.914603 2018-11-30 3.154994 0.379846 2.410510 3.899477 2018-12-31 3.122128 0.381915 2.373588 3.870667 2019-01-31 2.917377 0.382981 2.166749 3.668006 2019-02-28 3.091802 0.383531 2.340096 3.843509 2019-03-31 3.260538 0.383815 2.508274 4.012801 2019-04-30 3.008322 0.383962 2.255770 3.760873 2019-05-31 3.490022 0.384038 2.737322 4.242723 2019-06-30 3.180734 0.384077 2.427956 3.933511 Best Model Score: 0.39

ngupta23 commented 4 years ago

Hi @raghurajpandya, I am working on making enhancements to the library. I will verify this and make the necessary changes if I am able to recreate this. Currently the changes have been made to the SARIMAX and ML models (but still more needs to be done). The latest changes are on the develop branch.

raghurajpandya commented 4 years ago

Thanks @ngupta23 . Does auto-ts make the multivariate time series stationary for SARIMAX model? And is SARIMAX applied to the target of a multivariate input as a univariate time series analysis?

Thanks

ngupta23 commented 4 years ago

Hi @raghurajpandya , Could you explain why you would want to make the TS stationary for SARIMAX? The model has trend included through the difference terms and seasonality included as well (which would mean that the time series is non-stationary).

Regarding your second question, the version of autoTS on develop does include Multivariate TS for SARIMAX.

raghurajpandya commented 4 years ago

@ngupta23 not sure why I was considering stationary in the SARIMAX model.. but with prophet (with added regressors for lead indicators/features) I found that stationary feature indicators were recommended and they did provide an improvement in prediction. I am not sure if that would have similar effect with SARIMAX.

Second question response -Thanks. I have now installed it and running as I type. [pip install git+git://github.com/AutoViML/Auto_TS@develop]

ngupta23 commented 4 years ago

@raghurajpandya Please note that the interface for the new version on develop is different from what is there in master. The new interface uses scikit type of methods. You initialize the AutoML object, then you call the fit() and predict() methods. Once I am close to cleaning up, I will provide detailed examples for reference, but for now, you can refer to the unit test directory on develop to look at the flow.

raghurajpandya commented 4 years ago

Thanks. I have got it running on my 174x48 data set.

I have noticed that the VAR model takes longer than the master model. Is there something new therein? Does the develop AutoML include Prophet with added regressors? Also, with the ML model in Auto_TS would you suggest using tsfresh or similar feature generators for select causal lead indicators (granger test) before passing for best model selection?

ngupta23 commented 4 years ago

I just got done adding Multivariate support for Prophet. You can find an example here. Make sure you pull the latest version from the web (develop branch).

The ML models will need to be updated to add additional features such as various features from the timestamp, etc. This is planned to be started shortly. I will look into tsfresh and see if that can and should be incorporated as well.

I have not necessarily changed anything with the VAR models apart from converting them to a class. I think the way it is setup right now, it adds 1 exogenous variable at a time and builds a model, then in the end it picks the best model (with only 1 exogenous variable). I see you have a lot of exogenous variables so that might be slowing this down. I plan on looking at making improvements to VAR but that will come at a later point in time.

Hope this helps.

raghurajpandya commented 4 years ago

Thanks @ngupta23

ngupta23 commented 4 years ago

Hi @raghurajpandya,

I evaluated tsfresh and I don't think it is applicable here. We have only time series here so whatever features we calculate will apply to every time point in the time series. So I am not sure how this would help.

I think tsfresh is applicable more to cases when you have multiple time series and you want the ability to somehow distinguish or cluster them.

raghurajpandya commented 4 years ago

Hello @ngupta23 . I am looking to use tsfresh to help with decomposing exog. features /lead indicator - time series into multiple feature for each time stamp. Then rank the features (PCA or similar) to find the ones with most information. Then use them to build a supervised learning regression model to predict the target.

@VAR - auto-ts version 0.0.20 to 0.0.21 has seen a marked slower computation for the 'stats' model largely as the additional variable in VAR model is calculated. Is there a way I can only run SARIMAX in 0.0.21 or 0.0.20 without needing to run VAR?

Thanks

ngupta23 commented 4 years ago

Yes, you can selectively run models now by specifying the model_type argument when instantiating the auto_ts object. This is a list and takes the names of the models. You can also mix and match types of models with individual models (example, run all 'stats' model + 'prophet')

automl_model = ATS(
            score_type='rmse', forecast_period=12, time_interval='Month',
            non_seasonal_pdq=None, seasonality=False, seasonal_period=12,
            model_type=['SARIMAX', 'ML'],
            verbose=0)

I recommend checking out the documentation of this class for more detail on this

:param model_type The type(s) of model to build. Default to building only statistical models
Can be a string or a list of models. Allowed values are: 
'best', 'prophet', 'pyflux', 'stats', 'ARIMA', 'SARIMAX', 'VAR', 'ML'.
"prophet" will build a model using FB Prophet -> this means you must have FB Prophet installed
"stats" will build statsmodels based ARIMA, SARIMAX and VAR models
"ML" will build a machine learning model using Random Forests provided explanatory vars are given
'best' will try to build all models and pick the best one
If a list is provided, then only those models will be built
WARNING: "best" might take some time for large data sets. We recommend that you
choose a small sample from your data set bedfore attempting to run entire data.
:type model_type: Union[str, List]

I am still not completely sure about how you plan to use tsfresh for a single time series (you will get the same feature for each time stamp of the time series as per the example here. Can you prepare a small example notebook of how you plan on using it and sharing it here?

raghurajpandya commented 4 years ago

Thanks @ngupta23 . Yes I will add small example of how I plan to use the tsfresh.

raghurajpandya commented 3 years ago

https://github.com/alan-turing-institute/sktime/blob/master/examples/feature_extraction_with_tsfresh.ipynb Simple example for TsFresh - for feature extraction

AutoViML commented 3 years ago

@raghurajpandya : Can you please re-try with the latest version 0.0.26 and let us know if the problem is fixed? Thanks

AutoViML commented 3 years ago

This has been Fixed - just upgrade to the latest version by: "pip install auto_ts --ignore-installed --no-cache-dir"