Closed Guan-t7 closed 2 years ago
Hi @Guan-t7, thanks for the interest in our paper! Regarding your comments:
We follow the NBEATS' training strategy, which samples random windows at each time and doesn't necessarily cover all the training set. We have used this strategy extensively, and covering all the training set doesn't improve performance in most cases.
Traffic has more than 800 time series, are you sending the history of all the time series? If so, the input vector is larger than 70k, a feed-forward network is not the best architecture to learn interactions between a large number of inputs.
Yes, we realized there was a mistake with the scheduling, in particular with the datasets with more time series such as ECL and Traffic. We plan to fix this soon.
Yes, 1000 iterations is conservative for a dataset as Traffic. However, we wanted to keep the search space constant between the 6 datasets of the paper. The performance can be further improved by exploring different hyperparameters for each dataset separately.
Cool. Thanks for your reply!
Update: Additional questions
Your data pipeline seems quite non-traditional for me. At each training step, you randomly sample
256
windows from one time series as model input. A training epoch is finished by sampling each series once. I understand that it's a univariate model, but I don't see why you leave it to probability to cover the entire training span.I tried an ablation by feeding the data in multivariate fashion, i.e. input a history of all variables, roll windows along time dimension, learning (N, S) -> (N, T) where N == num_series. The result was bad on
traffic
dataset. Could you help explain?The paper says that you have lr
halved three times across the training procedure
. However, you mis-configured your pl_module. The default lr_schedule interval isepoch
(ref. https://pytorch-lightning.readthedocs.io/en/latest/common/lightning_module.html#configure-optimizers), which means that you actually kept training with initial lr till the end.You chose a training step of 1000, which is conservative considering your data feeding. For example, each ts is covered at most twice using
traffic
dataset. Training more steps slightly improved over your reported results ontraffic
dataset (at least).I hope these could help improve your model (Of course the metric presented is already impressive enough :).
=============================================== Thank you for this amazing work. I found these typo and doc issues:
https://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L398-L405
n_time_in
is actually the final Lookback periodhttps://github.com/cchallu/n-hits/blob/4e929ed31e1d3ff5169b4aa0d3762a0040abb8db/src/models/nhits/nhits.py#L248-L250
n_layers
innhits_multivariate.py
should be[ 3*[2] ]
rather than 9 since elements are indexed across 3 stacksloss_hypar
should be anint
like 7 or 24 from its contextThere are bypassed logics for exogenous variables in nhits model. I wonder if they can be put into work now?