firmai / atspy

AtsPy: Automated Time Series Models in Python (by @firmai)
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3580631
513 stars 89 forks source link
automated finance forecasting forecasting-models python time-series time-series-analysis

Automated Time Series Models in Python (AtsPy)

Downloads

DOI


Finance Quant Machine Learning


SSRN Report

Easily develop state of the art time series models to forecast univariate data series. Simply load your data and select which models you want to test. This is the largest repository of automated structural and machine learning time series models. Please get in contact if you want to contribute a model. This is a fledgling project, all advice appreciated.

Install

pip install atspy

Automated Models

  1. ARIMA - Automated ARIMA Modelling
  2. Prophet - Modeling Multiple Seasonality With Linear or Non-linear Growth
  3. HWAAS - Exponential Smoothing With Additive Trend and Additive Seasonality
  4. HWAMS - Exponential Smoothing with Additive Trend and Multiplicative Seasonality
  5. NBEATS - Neural basis expansion analysis (now fixed at 20 Epochs)
  6. Gluonts - RNN-based Model (now fixed at 20 Epochs)
  7. TATS - Seasonal and Trend no Box Cox
  8. TBAT - Trend and Box Cox
  9. TBATS1 - Trend, Seasonal (one), and Box Cox
  10. TBATP1 - TBATS1 but Seasonal Inference is Hardcoded by Periodicity
  11. TBATS2 - TBATS1 With Two Seasonal Periods

Why AtsPy?

  1. Implements all your favourite automated time series models in a unified manner by simply running AutomatedModel(df).
  2. Reduce structural model errors with 30%-50% by using LightGBM with TSFresh infused features.
  3. Automatically identify the seasonalities in your data using singular spectrum analysis, periodograms, and peak analysis.
  4. Identifies and makes accessible the best model for your time series using in-sample validation methods.
  5. Combines the predictions of all these models in a simple (average) and complex (GBM) ensembles for improved performance.
  6. Where appropriate models have been developed to use GPU resources to speed up the automation process.
  7. Easily access all the models by using am.models_dict_in for in-sample and am.models_dict_out for out-of-sample prediction.

AtsPy Progress

  1. Univariate forecasting only (single column) and only monthly and daily data have been tested for suitability.
  2. More work ahead; all suggestions and criticisms appreciated, use the issues tab.
  3. Here is a Google Colab to run the package in the cloud and here you can run all the models.

Documentation by Example


Load Package

from atspy import AutomatedModel

Pandas DataFrame

The data requires strict preprocessing, no periods can be skipped and there cannot be any empty values.

import pandas as pd
df = pd.read_csv("https://raw.githubusercontent.com/firmai/random-assets-two/master/ts/monthly-beer-australia.csv")
df.Month = pd.to_datetime(df.Month)
df = df.set_index("Month"); df
Megaliters
Month
1956-01-01 93.2
1956-02-01 96.0
1956-03-01 95.2
1956-04-01 77.1
1956-05-01 70.9

AutomatedModel

  1. AutomatedModel - Returns a class instance.
  2. forecast_insample - Returns an in-sample forcasted dataframe and performance.
  3. forecast_outsample - Returns an out-of-sample forcasted dataframe.
  4. ensemble - Returns the results of three different forms of ensembles.
  5. models_dict_in - Returns a dictionary of the fully trained in-sample models.
  6. models_dict_out - Returns a dictionary of the fully trained out-of-sample models.
from atspy import AutomatedModel
model_list = ["HWAMS","HWAAS","TBAT"]
am = AutomatedModel(df = df , model_list=model_list,forecast_len=20 )

Other models to try, add as many as you like; note ARIMA is slow: ["ARIMA","Gluonts","Prophet","NBEATS", "TATS", "TBATS1", "TBATP1", "TBATS2"]

In-Sample Performance

forecast_in, performance = am.forecast_insample(); forecast_in
Target HWAMS HWAAS TBAT
Date
1985-10-01 181.6 161.962148 162.391653 148.410071
1985-11-01 182.0 174.688055 173.191756 147.999237
1985-12-01 190.0 189.728744 187.649575 147.589541
1986-01-01 161.2 155.077205 154.817215 147.180980
1986-02-01 155.5 148.054292 147.477692 146.773549
performance
Target HWAMS HWAAS TBAT
rmse 0.000000 17.599400 18.993827 36.538009
mse 0.000000 309.738878 360.765452 1335.026136
mean 155.293277 142.399639 140.577496 126.590412

Out-of-Sample Forecast

forecast_out = am.forecast_outsample(); forecast_out
HWAMS HWAAS TBAT
Date
1995-09-01 137.518755 137.133938 142.906275
1995-10-01 164.136220 165.079612 142.865575
1995-11-01 178.671684 180.009560 142.827110
1995-12-01 184.175954 185.715043 142.790757
1996-01-01 147.166448 147.440026 142.756399

Ensemble and Model Validation Performance

all_ensemble_in, all_ensemble_out, all_performance = am.ensemble(forecast_in, forecast_out)
all_performance
rmse mse mean
ensemble_lgb__X__HWAMS 9.697588 94.043213 146.719412
ensemble_lgb__X__HWAMS__X__HWAMS_HWAAS__X__ensemble_ts__X__HWAAS 9.875212 97.519817 145.250837
ensemble_lgb__X__HWAMS__X__HWAMS_HWAAS 11.127326 123.817378 142.994374
ensemble_lgb 12.748526 162.524907 156.487208
ensemble_lgb__X__HWAMS__X__HWAMS_HWAAS__X__ensemble_ts__X__HWAAS__X__HWAMS_HWAAS_TBAT__X__TBAT 14.589155 212.843442 138.615567
HWAMS 15.567905 242.359663 136.951615
HWAMS_HWAAS 16.651370 277.268110 135.544299
ensemble_ts 17.255107 297.738716 163.134079
HWAAS 17.804066 316.984751 134.136983
HWAMS_HWAAS_TBAT 23.358758 545.631579 128.785846
TBAT 39.003864 1521.301380 115.268940

Best Performing In-sample

all_ensemble_in[["Target","ensemble_lgb__X__HWAMS","HWAMS","HWAAS"]].plot()

png

Future Predictions All Models

all_ensemble_out[["ensemble_lgb__X__HWAMS","HWAMS","HWAAS"]].plot()

png

And Finally Grab the Models

am.models_dict_in
{'HWAAS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f42f7822d30>,
 'HWAMS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f42f77fff60>,
 'TBAT': <tbats.tbats.Model.Model at 0x7f42d3aab048>}
am.models_dict_out
{'HWAAS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f9c01309278>,
 'HWAMS': <statsmodels.tsa.holtwinters.HoltWintersResultsWrapper at 0x7f9c01309cf8>,
 'TBAT': <tbats.tbats.Model.Model at 0x7f9c08f18ba8>}

Follow this link if you want to run the package in the cloud.

AtsPy Future Development

  1. Additional in-sample validation steps to stop deep learning models from over and underfitting.
  2. Extra performance metrics like MAPE and MAE.
  3. Improved methods to select the window length to use in training and calibrating the model.
  4. Add the ability to accept dirty data, and have the ability to clean it up, interpolation etc.
  5. Add a function to resample to a larger frequency for big datasets.
  6. Add the ability to algorithmically select a good enough chunk of a large dataset to balance performance and time to train.
  7. More internal model optimisation using AIC, BIC an AICC.
  8. Code annotations for other developers to follow and improve on the work being done.
  9. Force seasonality stability between in and out of sample training models.
  10. Make AtsPy less dependency heavy, currently it draws on tensorflow, pytorch and mxnet.

Citations

If you use AtsPy in your research, please consider citing it. I have also written a small report that can be found on SSRN.

BibTeX entry:

@software{atspy,
  title = {{AtsPy}: Automated Time Series Models in Python.},
  author = {Snow, Derek},
  url = {https://github.com/firmai/atspy/},
  version = {1.15},
  date = {2020-02-17},
}
@misc{atspy,
  author = {Snow, Derek},
  title = {{AtsPy}: Automated Time Series Models in Python (1.15).},
  year  = {2020},
  url   = {https://github.com/firmai/atspy/},
}