Weighted sampling of train sequences.

awslabs / gluonts

Probabilistic time series modeling in Python

https://ts.gluon.ai

Apache License 2.0

4.56k stars 748 forks source link

Weighted sampling of train sequences. #612

Open AaronSpieler opened 4 years ago

AaronSpieler commented 4 years ago

Description

I propose that we add the feature to do weighted sampling in the InstanceSplitter, and additionally the option to set a cutoff date before which none will be sampled. I should not be a breaking change as we can set the default cutoff to None and the default distribution of weights to uniform.

References

InstanceSplitter in GluonTS: https://gluon-ts.mxnet.io/api/gluonts/gluonts.transform.html?highlight=instancesplitter#gluonts.transform.InstanceSplitter
CutOff parameter used in N-BEATS: https://arxiv.org/abs/1905.10437
Weighted sampling in PyTorch: https://pytorch.org/docs/stable/data.html#torch.utils.data.WeightedRandomSampler

mbohlkeschneider commented 4 years ago

+1 on that one!

konradsemsch commented 4 years ago

Is there currently any possibility to impact the frequency with which individual time series are sampled/ weighted during model training? This paper: link discusses that (more frequent sampling of rare time series that exhibit overall higher volume of transactions), but I'm not sure how to translate that in code at GluonTS current state. Any hints how to go about that?

AaronSpieler commented 4 years ago

@konradsemsch 1) Anything that is sampling related and based on the specific properties of the time series itself is done in the InstanceSplitter. So you might find the functionality you looking for there, either by passing certain parameters, e.g. see the train_sampler parameter. For example you can control with the specific train sampler how many training examples should be generated from a time series based on its length. 2) Other then that an easy fix could be just preemptively replicating time series in your train set. 3) Regarding weighting the loss im not entirely sure, but I don't think we have that yet.

cscheidiger commented 3 years ago

+1 that'd be very helpful!