Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.94k stars 337 forks source link

[<Library component: Model|Core|etc...>] Bug with ray choice.tune() #807

Closed BowlOfFruits closed 10 months ago

BowlOfFruits commented 10 months ago

What happened + What you expected to happen

Hello, I've been trying to use AutoNBEATS and AutoNHITS by following the 'Getting Started' guide in the readme file but I've been encountering an error with tune.choice(), even though I installed the ray package. Here are the error details.

2023-11-06 09:43:18,701 INFO worker.py:1633 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
2023-11-06 09:43:26,974 INFO tune.py:228 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Tuner(...)`.
2023-11-06 09:43:26,996 INFO tune.py:654 -- [output] This will use the new output engine with verbosity 0. To disable the new output and use the legacy output engine, set the environment variable RAY_AIR_NEW_OUTPUT=0. For more information, please see https://github.com/ray-project/ray/issues/36949
╭────────────────────────────────────────────────────────────────────╮
│ Configuration for experiment     _train_tune_2023-11-06_09-43-10   │
├────────────────────────────────────────────────────────────────────┤
│ Search algorithm                 BasicVariantGenerator             │
│ Scheduler                        FIFOScheduler                     │
│ Number of trials                 10                                │
╰────────────────────────────────────────────────────────────────────╯

View detailed results here: C:/Users/wenex/ray_results/_train_tune_2023-11-06_09-43-10
To visualize your results with TensorBoard, run: `tensorboard --logdir C:/Users/wenex/ray_results/_train_tune_2023-11-06_09-43-10`
(pid=41536) 
(_train_tune pid=39772) Seed set to 14
2023-11-06 09:43:48,820 ERROR tune_controller.py:1502 -- Trial task failed for trial _train_tune_d78f7_00000
Traceback (most recent call last):
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\air\execution\_internal\event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\_private\auto_init_hook.py", line 24, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\_private\client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\_private\worker.py", line 2547, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(FileNotFoundError): ray::ImplicitFunc.train() (pid=39772, ip=127.0.0.1, actor_id=1d5f844de08093a31b0ab2d401000000, repr=_train_tune)
  File "python\ray\_raylet.pyx", line 1616, in ray._raylet.execute_task
  File "python\ray\_raylet.pyx", line 1556, in ray._raylet.execute_task.function_executor
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\_private\function_manager.py", line 726, in actor_method_executor
    return method(__ray_actor, *args, **kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\tune\trainable\trainable.py", line 400, in train
    raise skipped from exception_cause(skipped)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\air\_internal\util.py", line 91, in run
    self._ret = self._target(*self._args, **self._kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\tune\trainable\function_trainable.py", line 383, in <lambda>
    training_func=lambda: self._trainable_func(self.config),
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\util\tracing\tracing_helper.py", line 467, in _resume_span
    return method(self, *_args, **_kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\tune\trainable\function_trainable.py", line 822, in _trainable_func
    output = fn()
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\ray\tune\trainable\util.py", line 321, in inner
    return trainable(config, **fn_kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\neuralforecast\common\_base_auto.py", line 207, in _train_tune
    _ = self._fit_model(
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\neuralforecast\common\_base_auto.py", line 336, in _fit_model
    model.fit(dataset, val_size=val_size, test_size=test_size)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\neuralforecast\common\_base_windows.py", line 734, in fit
    trainer.fit(self, datamodule=datamodule)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 545, in fit
    call._call_and_handle_interrupt(
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 581, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 951, in _run
    call._call_setup_hook(self)  # allow user to setup lightning_module in accelerator environment
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\pytorch_lightning\trainer\call.py", line 86, in _call_setup_hook
    if hasattr(logger, "experiment"):
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\lightning_fabric\loggers\logger.py", line 118, in experiment
    return fn(self)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\lightning_fabric\loggers\tensorboard.py", line 191, in experiment
    self._experiment = SummaryWriter(log_dir=self.log_dir, **self._kwargs)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\tensorboardX\writer.py", line 300, in __init__
    self._get_file_writer()
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\tensorboardX\writer.py", line 348, in _get_file_writer
    self.file_writer = FileWriter(logdir=self.logdir,
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\tensorboardX\writer.py", line 104, in __init__
    self.event_writer = EventFileWriter(
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\tensorboardX\event_file_writer.py", line 106, in __init__
    self._ev_writer = EventsWriter(os.path.join(
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\tensorboardX\event_file_writer.py", line 43, in __init__
    self._py_recordio_writer = RecordWriter(self._file_name)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\tensorboardX\record_writer.py", line 182, in __init__
    self._writer = open_file(path)
  File "C:\Users\wenex\Desktop\Time-Series-Forecasting\venv_3.9\lib\site-packages\tensorboardX\record_writer.py", line 61, in open_file
    return open(path, 'wb')
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\wenex\\ray_results\\_train_tune_2023-11-06_09-43-10\\_train_tune_d78f7_00000_0_batch_size=64,input_size=72,learning_rate=0.0002,max_steps=1000,random_seed=14,scaler_type=standard,step_2023-11-06_09-43-27\\lightning_logs\\version_0\\events.out.tfevents.1699235028.SGDT0038KRWZ2LS'

Trial _train_tune_d78f7_00000 errored after 0 iterations at 2023-11-06 09:43:48. Total running time: 21s
Error file: C:/Users/wenex/ray_results/_train_tune_2023-11-06_09-43-10/_train_tune_d78f7_00000_0_batch_size=64,input_size=72,learning_rate=0.0002,max_steps=1000,random_seed=14,scaler_type=standard,step_2023-11-06_09-43-27\error.txt

The next trial tasks all failed as well, producing the same 'FileNotFoundError'. Has anyone come across this problem before? I get this error message at the end of everything

2023-11-06 09:46:35,390 ERROR tune.py:1139 -- Trials did not complete: [_train_tune_d78f7_00000, _train_tune_d78f7_00001, 
 ... , _train_tune_d78f7_00009]

nhits model testing failed

No best trial found for the given metric: loss. This means that no trial has reported this metric, or all values reported for this metric are NaN. To not ignore NaN values, you can set the `filter_nan_and_inf` arg to False.

2023-11-06 09:46:35,491 WARNING experiment_analysis.py:596 -- Could not find best trial. Did you pass the correct `metric` parameter?

Versions / Dependencies

Windows 10

pandas - 2.1.1 ray - 2.7.1 numpy - 1.25.2 neuralforecast - 1.6.4

Reproduction script

from ray import tune
from neuralforecast.auto import AutoNBEATS
from neuralforecast.core import NeuralForecast

models = [AutoNBEATS(h=test_number, num_samples=10)]

# Edits made to dataframe to feed into nbeats as input
df_renamed = train_target.rename(columns={'Total Invoiced Units': 'y'})  # Original target column name is 'Total Invoiced Units'
df_renamed['unique_id'] = 0  # All the values belong in the same series, no multiple series involved.

nf = NeuralForecast(models=models, freq='MS')  # Monthly data, i.e. 2020-01-01, 2020-02-01, ...
nf.fit(df=df_renamed)
y_hat_df = nf.predict().reset_index()

Issue Severity

None

jmoralez commented 10 months ago

Hey @BowlOfFruits, thanks for using neuralforecast. This has been reported in #526, it's a problem with the names of the files created by ray. I'll close this one to focus the conversation there. In the meantime, you can try our optuna backend instead.