awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.53k stars 745 forks source link

Putting together a tutorial using some standard data sets from the literature? #660

Open SkanderHn opened 4 years ago

SkanderHn commented 4 years ago

Description

I've been experimenting with GluonTS and it has been a real pleasure working with this framework. I've tried it on a few text book time series (Air Passengers, Car Sales, Australian Tourism data) - and I was wondering if there was any benefit to putting that all together in a tutorial, especially with the Australian Tourism data set, since it has become almost a canonical example of grouped times series that can be approached via hierarchical forecasting. The tutorial would demonstrate how to use the hierarchy information as static features for the estimator.

Do you think there is any benefit to such a tutorial or would it be redundant with the quick start and extended tutorials? Or if you think any other data sets would be better for such a tutorial, I'de be happy to work on contribute that as well.

References

StatMixedML commented 4 years ago

I believe that is a very nice idea to illustrate the functionality of GluonTS. Let me add that we should have a notebook for each estimator available, e.g., DeepAR, DeepGP, DeepState, DeepFactor. I have been using the Australian tourism data to test some of these models over the last week. I can volunteer and share the notebooks so as to use them as a working template. Coding wise it might not be the most elegant, but we can all contribute and improve them. I am also referring to this issue https://github.com/awslabs/gluon-ts/issues/647#issue-566361839.

geoalgo commented 4 years ago

It is a very good idea, it would be very nice to write a wrapper so that we can access those datasets with:

from gluonts.dataset.repository.datasets import get_dataset
dataset = get_dataset("tourism")
# ... train your favorite estimator

We currently added m4 datasets and a bunch of others in this wrapper, this would make it easily available for benchmarks.

SkanderHn commented 4 years ago

@StatMixedML I'm curious if you would share how you take the original Australian tourism CSV and parse it into a GluonTS data set format?

I myself have using some very hacky and not at all elegant Pandas manipulations, along the lines of:

I'm wondering if there is a more straightforward and elegant way to do so?

StatMixedML commented 4 years ago

@SkanderHn I hope that I can share the notebook some time next week, didn`t have time during last week.

@geoalgo What is the best way of sharing notebooks? Via PR?

kaijennissen commented 3 years ago

@StatMixedML Any progress with the notebooks? I think they would be really helpful for a lot of users.