Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.9k stars 332 forks source link

Time series is too short for training, consider setting a smaller input size or set start_padding_enabled=True #1061

Closed camongman closed 2 weeks ago

camongman commented 1 month ago

What happened + What you expected to happen

Hi, I'v been struggling to solve this issue for a few days. I'm using neuralforecast to predict the usages of EC2 resources on the cloud, which is composed of 37 dataframes for the test. Anyway, to get it easier. I took some data into df(dataframes) like below. unique_id 0 8784 1 8784 2 8784 5 8784 6 8784 9 8784 12 8784 13 8784 14 8784 20 8784 31 8784 32 8784 33 8784 34 8784 35 8784 36 8784 Name: count, dtype: int64

training code is like this.

horizon = 12 # 1 2 hours models = [

        NHITS(h=horizon,                   # Forecast horizon
              input_size= horizon,      # Length of input sequence
              max_steps=10,
              start_padding_enabled=True,
              )
      ]

cpu_nf = NeuralForecast(models=[models[2]], freq='1H') # NHITS cpu_nf.fit(df=train_df, val_size=int(0.1 * len(train_df)))

Training was successful only if I tried 2 or 3 dataframes . but when I tried 10 or more dataframes , I got error messages. I also want you to know that I changed input_size to the small number 2, or 4 as your ai assistant is saying . but result was the same.

thanks.

Versions / Dependencies

neuralforecast==1.7.3

Reproduction script

horizon = 12 # 1 2 hours models = [

        NHITS(h=horizon,                   # Forecast horizon
              input_size= horizon,      # Length of input sequence
              max_steps=10,
              start_padding_enabled=True,
              )
      ]

cpu_nf = NeuralForecast(models=[models[2]], freq='1H') # NHITS cpu_nf.fit(df=train_df, val_size=int(0.1 * len(train_df)))

part of error mesages { "name": "Exception", "message": "Time series is too short for training, consider setting a smaller input size or set start_padding_enabled=True", "stack": "--------------------------------------------------------------------------- Exception Traceback (most recent call last) Cell In[22], line 5 2 cpu_nf = NeuralForecast(models=[models[2]], freq='1H') # NHITS 3 # cpu_nf = NeuralForecast(models=models, freq='min') # NHITS 4 # cpu_nf.fit(df=splitted_train_df, static_df=static_df, val_size=10000) ----> 5 cpu_nf.fit(df=train_df, val_size=int(0.1 * len(train_df)))

File /opt/conda/lib/python3.10/site-packages/neuralforecast/core.py:486, in NeuralForecast.fit(self, df, static_df, val_size, sort_df, use_init_models, verbose, id_col, time_col, target_col, distributed_config) 483 self._reset_models() 485 for i, model in enumerate(self.models): --> 486 self.models[i] = model.fit( 487 self.dataset, val_size=val_size, distributed_config=distributed_config 488 ) 490 self._fitted = True

File /opt/conda/lib/python3.10/site-packages/neuralforecast/common/_base_windows.py:661, in BaseWindows.fit(self, dataset, val_size, test_size, random_seed, distributed_config) 632 def fit( 633 self, 634 dataset, (...) 638 distributed_config=None, 639 ): 640 \"\"\"Fit. 641 642 The fit method, optimizes the neural network's weights using the (...) 659 test_size: int, test size for temporal cross-validation.
660 \"\"\" --> 661 return self._fit( 662 dataset=dataset, 663 batch_size=self.batch_size, 664 valid_batch_size=self.valid_batch_size, 665 val_size=val_size, 666 test_size=test_size, 667 random_seed=random_seed, 668 distributed_config=distributed_config, 669 )

File /opt/conda/lib/python3.10/site-packages/neuralforecast/common/_base_model.py:357, in BaseModel._fit(self, dataset, batch_size, valid_batch_size, val_size, test_size, random_seed, shuffle_train, distributed_config) 355 model = self 356 trainer = pl.Trainer(**model.trainer_kwargs) --> 357 trainer.fit(model, datamodule=datamodule) 358 model.metrics = trainer.callback_metrics 359 model.dict.pop(\"_trainer\", None)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:544, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) 542 self.state.status = TrainerStatus.RUNNING 543 self.training = True --> 544 call._call_and_handle_interrupt( 545 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path 546 )

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, kwargs) 42 if trainer.strategy.launcher is not None: 43 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, *kwargs) ---> 44 return trainer_fn(args, kwargs) 46 except _TunerExitException: 47 _call_teardown_hook(trainer)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:580, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path) 573 assert self.state.fn is not None 574 ckpt_path = self._checkpoint_connector._select_ckpt_path( 575 self.state.fn, 576 ckpt_path, 577 model_provided=True, 578 model_connected=self.lightning_module is not None, 579 ) --> 580 self._run(model, ckpt_path=ckpt_path) 582 assert self.state.stopped 583 self.training = False

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:989, in Trainer._run(self, model, ckpt_path) 984 self._signal_connector.register_signal_handlers() 986 # ---------------------------- 987 # RUN THE TRAINER 988 # ---------------------------- --> 989 results = self._run_stage() 991 # ---------------------------- 992 # POST-Training CLEAN UP 993 # ---------------------------- 994 log.debug(f\"{self.class.name}: trainer tearing down\")

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py:1035, in Trainer._run_stage(self) 1033 self._run_sanity_check() 1034 with torch.autograd.set_detect_anomaly(self._detect_anomaly): -> 1035 self.fit_loop.run() 1036 return None 1037 raise RuntimeError(f\"Unexpected state {self.state}\")

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:202, in _FitLoop.run(self) 200 try: 201 self.on_advance_start() --> 202 self.advance() 203 self.on_advance_end() 204 self._restarting = False

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/fit_loop.py:359, in _FitLoop.advance(self) 357 with self.trainer.profiler.profile(\"run_training_epoch\"): 358 assert self._data_fetcher is not None --> 359 self.epoch_loop.run(self._data_fetcher)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:136, in _TrainingEpochLoop.run(self, data_fetcher) 134 while not self.done: 135 try: --> 136 self.advance(data_fetcher) 137 self.on_advance_end(data_fetcher) 138 self._restarting = False

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/training_epoch_loop.py:240, in _TrainingEpochLoop.advance(self, data_fetcher) 237 with trainer.profiler.profile(\"run_training_batch\"): 238 if trainer.lightning_module.automatic_optimization: 239 # in automatic optimization, there can only be one optimizer --> 240 batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs) 241 else: 242 batch_output = self.manual_optimization.run(kwargs)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py:187, in _AutomaticOptimization.run(self, optimizer, batch_idx, kwargs) 180 closure() 182 # ------------------------------ 183 # BACKWARD PASS 184 # ------------------------------ 185 # gradient update with accumulated gradients 186 else: --> 187 self._optimizer_step(batch_idx, closure) 189 result = closure.consume_result() 190 if result.loss is None:

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py:265, in _AutomaticOptimization._optimizer_step(self, batch_idx, train_step_and_backward_closure) 262 self.optim_progress.optimizer.step.increment_ready() 264 # model hook --> 265 call._call_lightning_module_hook( 266 trainer, 267 \"optimizer_step\", 268 trainer.current_epoch, 269 batch_idx, 270 optimizer, 271 train_step_and_backward_closure, 272 ) 274 if not should_accumulate: 275 self.optim_progress.optimizer.step.increment_completed()

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:157, in _call_lightning_module_hook(trainer, hook_name, pl_module, *args, *kwargs) 154 pl_module._current_fx_name = hook_name 156 with trainer.profiler.profile(f\"[LightningModule]{pl_module.class.name}.{hook_name}\"): --> 157 output = fn(args, **kwargs) 159 # restore current_fx when nested context 160 pl_module._current_fx_name = prev_fx_name

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/module.py:1291, in LightningModule.optimizer_step(self, epoch, batch_idx, optimizer, optimizer_closure) 1252 def optimizer_step( 1253 self, 1254 epoch: int, (...) 1257 optimizer_closure: Optional[Callable[[], Any]] = None, 1258 ) -> None: 1259 r\"\"\"Override this method to adjust the default way the :class:~pytorch_lightning.trainer.trainer.Trainer calls 1260 the optimizer. 1261 (...) 1289 1290 \"\"\" -> 1291 optimizer.step(closure=optimizer_closure)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/core/optimizer.py:151, in LightningOptimizer.step(self, closure, kwargs) 148 raise MisconfigurationException(\"When optimizer.step(closure) is called, the closure should be callable\") 150 assert self._strategy is not None --> 151 step_output = self._strategy.optimizer_step(self._optimizer, closure, kwargs) 153 self._on_after_step() 155 return step_output

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py:230, in Strategy.optimizer_step(self, optimizer, closure, model, kwargs) 228 # TODO(fabric): remove assertion once strategy's optimizer_step typing is fixed 229 assert isinstance(model, pl.LightningModule) --> 230 return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, kwargs)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py:117, in Precision.optimizer_step(self, optimizer, model, closure, kwargs) 115 \"\"\"Hook to run the optimizer step.\"\"\" 116 closure = partial(self._wrap_closure, model, optimizer, closure) --> 117 return optimizer.step(closure=closure, kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:75, in LRScheduler.init..with_counter..wrapper(*args, *kwargs) 73 instance._step_count += 1 74 wrapped = func.get(instance, cls) ---> 75 return wrapped(args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py:385, in Optimizer.profile_hook_step..wrapper(*args, *kwargs) 380 else: 381 raise RuntimeError( 382 f\"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}.\" 383 ) --> 385 out = func(args, **kwargs) 386 self._optimizer_step_code() 388 # call optimizer step post hooks

File /opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py:76, in _use_grad_for_differentiable.._use_grad(self, *args, *kwargs) 74 torch.set_grad_enabled(self.defaults['differentiable']) 75 torch._dynamo.graph_break() ---> 76 ret = func(self, args, **kwargs) 77 finally: 78 torch._dynamo.graph_break()

File /opt/conda/lib/python3.10/site-packages/torch/optim/adam.py:146, in Adam.step(self, closure) 144 if closure is not None: 145 with torch.enable_grad(): --> 146 loss = closure() 148 for group in self.param_groups: 149 params_with_grad = []

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/plugins/precision/precision.py:104, in Precision._wrap_closure(self, model, optimizer, closure) 91 def _wrap_closure( 92 self, 93 model: \"pl.LightningModule\", 94 optimizer: Optimizer, 95 closure: Callable[[], Any], 96 ) -> Any: 97 \"\"\"This double-closure allows makes sure the closure is executed before the on_before_optimizer_step 98 hook is called. 99 (...) 102 103 \"\"\" --> 104 closure_result = closure() 105 self._after_closure(model, optimizer) 106 return closure_result

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py:140, in Closure.call(self, *args, kwargs) 139 def call(self, *args: Any, *kwargs: Any) -> Optional[Tensor]: --> 140 self._result = self.closure(args, kwargs) 141 return self._result.loss

File /opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 with ctx_factory(): --> 115 return func(args, kwargs)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py:126, in Closure.closure(self, *args, *kwargs) 124 @torch.enable_grad() 125 def closure(self, args: Any, **kwargs: Any) -> ClosureResult: --> 126 step_output = self._step_fn() 128 if step_output.closure_loss is None: 129 self.warning_cache.warn(\"training_step returned None. If this was on purpose, ignore this warning...\")

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/loops/optimization/automatic.py:315, in _AutomaticOptimization._training_step(self, kwargs) 312 trainer = self.trainer 314 # manually capture logged metrics --> 315 training_step_output = call._call_strategy_hook(trainer, \"training_step\", *kwargs.values()) 316 self.trainer.strategy.post_training_step() # unused hook - call anyway for backward compatibility 318 return self.output_result_cls.from_training_step_output(training_step_output, trainer.accumulate_grad_batches)

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py:309, in _call_strategy_hook(trainer, hook_name, *args, *kwargs) 306 return None 308 with trainer.profiler.profile(f\"[Strategy]{trainer.strategy.class.name}.{hook_name}\"): --> 309 output = fn(args, **kwargs) 311 # restore current_fx when nested context 312 pl_module._current_fx_name = prev_fx_name

File /opt/conda/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py:382, in Strategy.training_step(self, *args, kwargs) 380 if self.model != self.lightning_module: 381 return self._forward_redirection(self.model, self.lightning_module, \"training_step\", *args, *kwargs) --> 382 return self.lightning_module.training_step(args, kwargs)

File /opt/conda/lib/python3.10/site-packages/neuralforecast/common/_base_windows.py:396, in BaseWindows.training_step(self, batch, batch_idx) 394 def training_step(self, batch, batch_idx): 395 # Create and normalize windows [Ws, L+H, C] --> 396 windows = self._create_windows(batch, step=\"train\") 397 y_idx = batch[\"y_idx\"] 398 original_outsample_y = torch.clone(windows[\"temporal\"][:, -self.h :, y_idx])

File /opt/conda/lib/python3.10/site-packages/neuralforecast/common/_base_windows.py:150, in BaseWindows._create_windows(self, batch, step, w_idxs) 148 temporal = self.padder_train(temporal) 149 if temporal.shape[-1] < window_size: --> 150 raise Exception( 151 \"Time series is too short for training, consider setting a smaller input size or set start_padding_enabled=True\" 152 ) 153 windows = temporal.unfold( 154 dimension=-1, size=window_size, step=self.step_size 155 ) 157 # [B, C, Ws, L+H] 0, 1, 2, 3 158 # -> [B * Ws, L+H, C] 0, 2, 3, 1

Exception: Time series is too short for training, consider setting a smaller input size or set start_padding_enabled=True" }

Issue Severity

High: It blocks me from completing my task.

elephaint commented 1 month ago

I would guess that since only some dataframes fail, there is an issue with some of these dataframes. As these are not provided, I can't verify it, but I would advise to check, for every dataframe, that it has valid values and columns. Try to isolate the dataframe that causes the issue (by repeated elimination of dataframes). Subsequently, investigate the dataframe(s) that fail. What is different about these dataframes compared to the dataframes that don't fail?

Let me know if the above procedure diagnosed the issue.

github-actions[bot] commented 2 weeks ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.