Benchmark : Web Traffic Time Series Forecasting competition

PyAF is open source. One advantage is that it is possible to openly and transparently use public benchmarks to assess the quality of forecasts.

Kaggle is releasing a Web Traffic Time Series Forecasting competition. This is a very interesting benchmark for PyAF for many reasons:

The dataset is public and large (wikipedia projects visits) covering more than 100000 time series covering a daily history of 2 years.
The forecasting task is real (present data, as of 2017-08-12) and the future data are not yet available (the competition score is based on forecasts for next November ;)
The dataset has a very intersting hierarchical forecasting aspect (separate series for each web page inside projects)

While we want to have at least one Kaggle participation before the end of the competition (2017-09-10), we hope that it will be possible to submit forecasts after that time limit to be able to test future versions/evolutions of PyAF.

The goal here is to be able to :

Create a piece of code to be able to read/parse data and create a submission to Kaggle in the right format (a lot of python/numpy/pandas plumbing ;) with a default prediction of zero everywhere (plumbing+zero-everywhere-task).
Adapt the previous code to be used a default configuration of PyAF to generate individual forecasts (default-pyaf-individual-forecast-task). Forecasting more than 100000 signals in a few hours is already a real performance per se (parallelizing forecasts).
Extend the previous code to use a default configuration of PyAF to generate hierarchical forecasts (default-pyaf-hierarchical-forecast-task).

antoinecarme / pyaf

Benchmark : Web Traffic Time Series Forecasting competition #57