Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.98k stars 342 forks source link

ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 #987

Closed LeonTing1010 closed 3 months ago

LeonTing1010 commented 5 months ago

What happened + What you expected to happen

(_train_tune pid=59932) /Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/integration/pytorch_lightning.py:198: ray.tune.integration.pytorch_lightning.TuneReportCallback is deprecated. Use ray.tune.integration.pytorch_lightning.TuneReportCheckpointCallback instead. (_train_tune pid=59932) Seed set to 1 2024-05-01 01:27:11,649 ERROR tune_controller.py:1331 -- Trial task failed for trial _train_tune_793c7e93 Traceback (most recent call last): File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future result = ray.get(future) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper return fn(*args, kwargs) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper return func(*args, *kwargs) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 2623, in get values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/_private/worker.py", line 861, in get_objects raise value.as_instanceof_cause() ray.exceptions.RayTaskError(ValueError): ray::ImplicitFunc.train() (pid=59932, ip=127.0.0.1, actor_id=b48464a8f9278052285d8c3c01000000, repr=_train_tune) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 330, in train raise skipped from exception_cause(skipped) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/air/_internal/util.py", line 98, in run self._ret = self._target(self._args, self._kwargs) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 45, in training_func=lambda: self._trainable_func(self.config), File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/function_trainable.py", line 253, in _trainable_func output = fn() File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 130, in inner return trainable(config, fn_kwargs) File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_auto.py", line 209, in _traintune = self._fit_model( File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_auto.py", line 357, in _fit_model model = model.fit( File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_multivariate.py", line 537, in fit return self._fit( File "/Users/leo/web3/LLM/langchain/neuralforecast/neuralforecast/common/_base_model.py", line 218, in _fit trainer = pl.Trainer(model.trainer_kwargs) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/argparse.py", line 70, in insert_env_defaults return fn(self, **kwargs) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 431, in init self._callback_connector.on_trainer_init( File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 79, in on_trainer_init _validate_callbacks_list(self.trainer.callbacks) File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in _validate_callbacks_list stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)] File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/trainer/connectors/callback_connector.py", line 227, in stateful_callbacks = [cb for cb in callbacks if is_overridden("state_dict", instance=cb)] File "/Users/leo/web3/LLM/langchain/venv/lib/python3.10/site-packages/pytorch_lightning/utilities/model_helpers.py", line 42, in is_overridden raise ValueError("Expected a parent") ValueError: Expected a parent

Versions / Dependencies

Name: neuralforecast Version: 1.7.1 Summary: Time series forecasting suite using deep learning models Home-page: https://github.com/Nixtla/neuralforecast/ Author: Nixtla Author-email: business@nixtla.io License: Apache Software License 2.0

Reproduction script

Y_hat_df = nf.cross_validation(df=Y_train_df, val_size=val_size, test_size=test_size, n_windows=None )

Issue Severity

High: It blocks me from completing my task.

LeonTing1010 commented 5 months ago

from neuralforecast.auto import AutoTSMixer, AutoTSMixerx from ray.tune.search.hyperopt import HyperOptSearch from ray import tune from neuralforecast.losses.numpy import mse, mae import matplotlib.pyplot as plt import pandas as pd

from datasetsforecast.long_horizon import LongHorizon from neuralforecast.core import NeuralForecast from neuralforecast.models import TSMixer, TSMixerx, NHITS, MLPMultivariate, iTransformer from neuralforecast.losses.pytorch import MSE, MAE

Change this to your own data to try the model

Y_df, Xdf, = LongHorizon.load(directory='./', group='ETTm2') Y_df['ds'] = pd.to_datetime(Y_df['ds'])

X_df contains the exogenous features, which we add to Y_df

X_df['ds'] = pd.to_datetime(X_df['ds']) Y_df = Y_df.merge(X_df, on=['unique_id', 'ds'], how='left')

We make validation and test splits

n_time = len(Y_df.ds.unique()) val_size = int(.2 n_time) test_size = int(.2 n_time) horizon = 96 input_size = 512

tsmixer_config = { "input_size": input_size, # Size of input window "max_steps": tune.choice([500, 1000, 2000]), # Number of training iterations "val_check_steps": 100, # Compute validation every x steps "early_stop_patience_steps": 5, # Early stopping steps "learning_rate": tune.loguniform(1e-4, 1e-2), # Initial Learning rate "n_block": tune.choice([1, 2, 4, 6, 8]), # Number of mixing layers "dropout": tune.uniform(0.0, 0.99), # Dropout "ff_dim": tune.choice([32, 64, 128]), # Dimension of the feature linear layer "scaler_type": 'identity', }

tsmixerx_config = tsmixer_config.copy() tsmixerx_config['futr_exog_list'] = ['ex_1', 'ex_2', 'ex_3', 'ex_4'] modelx = AutoTSMixerx(h=horizon, n_series=7, loss=MAE(), config=tsmixerx_config, num_samples=10, search_alg=HyperOptSearch(), backend='ray', valid_loss=MAE())

nf = NeuralForecast(models=[modelx], freq='15min') Y_hat_df = nf.cross_validation(df=Y_df, val_size=val_size, test_size=test_size, n_windows=None) print(nf.models[0].results.get_best_result().config) y_true = Y_hat_df.y.values y_hat_tsmixerx = Y_hat_df['AutoTSMixerx'].values

print(f'MAE TSMixerx: {mae(y_hat_tsmixerx, y_true):.3f}') print(f'MSE TSMixerx: {mse(y_hat_tsmixerx, y_true):.3f}')

elephaint commented 5 months ago

Thanks - this is weird, if I run your code it runs without any issue.

Can you give more details about the machine config (OS, Python) you are using? How are you running this script?

If I'd have to guess it's a package conflict issue - so I would create a new virtual environment, install neuralforecast in that environment, and try rerunning the script.

github-actions[bot] commented 3 months ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.