antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
459 stars 72 forks source link

Benchmark : Web Traffic Time Series Forecasting competition #57

Closed antoinecarme closed 3 years ago

antoinecarme commented 7 years ago

PyAF is open source. One advantage is that it is possible to openly and transparently use public benchmarks to assess the quality of forecasts.

Kaggle is releasing a Web Traffic Time Series Forecasting competition. This is a very interesting benchmark for PyAF for many reasons:

  1. The dataset is public and large (wikipedia projects visits) covering more than 100000 time series covering a daily history of 2 years.
  2. The forecasting task is real (present data, as of 2017-08-12) and the future data are not yet available (the competition score is based on forecasts for next November ;)
  3. The dataset has a very intersting hierarchical forecasting aspect (separate series for each web page inside projects)

While we want to have at least one Kaggle participation before the end of the competition (2017-09-10), we hope that it will be possible to submit forecasts after that time limit to be able to test future versions/evolutions of PyAF.

The goal here is to be able to :

  1. Create a piece of code to be able to read/parse data and create a submission to Kaggle in the right format (a lot of python/numpy/pandas plumbing ;) with a default prediction of zero everywhere (plumbing+zero-everywhere-task).
  2. Adapt the previous code to be used a default configuration of PyAF to generate individual forecasts (default-pyaf-individual-forecast-task). Forecasting more than 100000 signals in a few hours is already a real performance per se (parallelizing forecasts).
  3. Extend the previous code to use a default configuration of PyAF to generate hierarchical forecasts (default-pyaf-hierarchical-forecast-task).
antoinecarme commented 3 years ago

Too old. Closing.