The dataset is public and large (wikipedia projects visits) covering more than 100000 time series covering a daily history of 2 years.
The forecasting task is real (present data, as of 2017-08-12) and the future data are not yet available (the competition score is based on forecasts for next November ;)
The dataset has a very intersting hierarchical forecasting aspect (separate series for each web page inside projects)
While we want to have at least one Kaggle participation before the end of the competition (2017-09-10), we hope that it will be possible to submit forecasts after that time limit to be able to test future versions/evolutions of PyAF.
The goal here is to be able to :
Create a piece of code to be able to read/parse data and create a submission to Kaggle in the right format (a lot of python/numpy/pandas plumbing ;) with a default prediction of zero everywhere (plumbing+zero-everywhere-task).
Adapt the previous code to be used a default configuration of PyAF to generate individual forecasts (default-pyaf-individual-forecast-task). Forecasting more than 100000 signals in a few hours is already a real performance per se (parallelizing forecasts).
Extend the previous code to use a default configuration of PyAF to generate hierarchical forecasts (default-pyaf-hierarchical-forecast-task).
PyAF is open source. One advantage is that it is possible to openly and transparently use public benchmarks to assess the quality of forecasts.
Kaggle is releasing a Web Traffic Time Series Forecasting competition. This is a very interesting benchmark for PyAF for many reasons:
While we want to have at least one Kaggle participation before the end of the competition (2017-09-10), we hope that it will be possible to submit forecasts after that time limit to be able to test future versions/evolutions of PyAF.
The goal here is to be able to :