Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.93k stars 336 forks source link

Multi-GPU error #617

Closed batuhan3526 closed 6 months ago

batuhan3526 commented 1 year ago

What happened + What you expected to happen

I has discovered this library. I wanted to give a try. I just copied overview page's code then got this error. I have 2 GPU. I think it's a problem with Multi-GPU as it gives DDP error. I guess the library automatically selects cuda:0 and I'm skeptical about Multi-GPU compatibility. There isn't enough throughput to use a multi-GPU. But I think I should be able to pick a single GPU and continue.

Code:

import numpy as np
import pandas as pd
from IPython.display import display, Markdown

import matplotlib.pyplot as plt
from neuralforecast import NeuralForecast
from neuralforecast.models import NBEATS, NHITS
from neuralforecast.utils import AirPassengersDF

# Split data and declare panel dataset
Y_df = AirPassengersDF
Y_train_df = Y_df[Y_df.ds<='1959-12-31'] # 132 train
Y_test_df = Y_df[Y_df.ds>'1959-12-31'] # 12 test

# Fit and predict with NBEATS and NHITS models
horizon = len(Y_test_df)
models = [NBEATS(input_size=2 * horizon, h=horizon, max_epochs=50),
          NHITS(input_size=2 * horizon, h=horizon, max_epochs=50)]
nf = NeuralForecast(models=models, freq='M')
nf.fit(df=Y_train_df)
Y_hat_df = nf.predict().reset_index()

# Plot predictions
fig, ax = plt.subplots(1, 1, figsize = (20, 7))
Y_hat_df = Y_test_df.merge(Y_hat_df, how='left', on=['unique_id', 'ds'])
plot_df = pd.concat([Y_train_df, Y_hat_df]).set_index('ds')

plot_df[['y', 'NBEATS', 'NHITS']].plot(ax=ax, linewidth=2)

ax.set_title('AirPassengers Forecast', fontsize=22)
ax.set_ylabel('Monthly Passengers', fontsize=20)
ax.set_xlabel('Timestamp [t]', fontsize=20)
ax.legend(prop={'size': 15})
ax.grid()

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ <ipython-input-1-334fb7058e65>:20 in <module>                                                    │
│                                                                                                  │
│ C:\Anaconda3\lib\site-packages\neuralforecast\core.py:238 in fit                                 │
│                                                                                                  │
│   235 │   │                                                                                      │
│   236 │   │   # train + validation                                                               │
│   237 │   │   for model in self.models:                                                          │
│ ❱ 238 │   │   │   model.fit(self.dataset, val_size=val_size)                                     │
│   239 │   │   # train with the full dataset                                                      │
│   240 │   │                                                                                      │
│   241 │   │   self._fitted = True                                                                │
│                                                                                                  │
│ C:\Anaconda3\lib\site-packages\neuralforecast\common\_base_windows.py:610 in fit                 │
│                                                                                                  │
│   607 │   │   self.trainer_kwargs["val_check_interval"] = val_check_interval                     │
│   608 │   │   self.trainer_kwargs["check_val_every_n_epoch"] = check_val_every_n_epoch           │
│   609 │   │                                                                                      │
│ ❱ 610 │   │   trainer = pl.Trainer(**self.trainer_kwargs)                                        │
│   611 │   │   trainer.fit(self, datamodule=datamodule)                                           │
│   612 │                                                                                          │
│   613 │   def predict(                                                                           │
│                                                                                                  │
│ C:\Anaconda3\lib\site-packages\pytorch_lightning\utilities\argparse.py:69 in insert_env_defaults │
│                                                                                                  │
│   66 │   │   kwargs = dict(list(env_variables.items()) + list(kwargs.items()))                   │
│   67 │   │                                                                                       │
│   68 │   │   # all args were already moved to kwargs                                             │
│ ❱ 69 │   │   return fn(self, **kwargs)                                                           │
│   70 │                                                                                           │
│   71 │   return cast(_T, insert_env_defaults)                                                    │
│   72                                                                                             │
│                                                                                                  │
│ C:\Anaconda3\lib\site-packages\pytorch_lightning\trainer\trainer.py:393 in __init__              │
│                                                                                                  │
│    390 │   │   # init connectors                                                                 │
│    391 │   │   self._data_connector = _DataConnector(self)                                       │
│    392 │   │                                                                                     │
│ ❱  393 │   │   self._accelerator_connector = _AcceleratorConnector(                              │
│    394 │   │   │   devices=devices,                                                              │
│    395 │   │   │   accelerator=accelerator,                                                      │
│    396 │   │   │   strategy=strategy,                                                            │
│                                                                                                  │
│ C:\Anaconda3\lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py:166 │
│ in __init__                                                                                      │
│                                                                                                  │
│   163 │   │   if self._strategy_flag == "auto":                                                  │
│   164 │   │   │   self._strategy_flag = self._choose_strategy()                                  │
│   165 │   │   # In specific cases, ignore user selection and fall back to a different strategy   │
│ ❱ 166 │   │   self._check_strategy_and_fallback()                                                │
│   167 │   │   self._init_strategy()                                                              │
│   168 │   │                                                                                      │
│   169 │   │   # 5. Instantiate Precision Plugin                                                  │
│                                                                                                  │
│ C:\Anaconda3\lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py:465 │
│ in _check_strategy_and_fallback                                                                  │
│                                                                                                  │
│   462 │   │   │   │   f"You selected strategy to be `{FSDPStrategy.strategy_name}`, but GPU ac   │
│   463 │   │   │   )                                                                              │
│   464 │   │   if strategy_flag in _DDP_FORK_ALIASES and "fork" not in torch.multiprocessing.ge   │
│ ❱ 465 │   │   │   raise ValueError(                                                              │
│   466 │   │   │   │   f"You selected `Trainer(strategy='{strategy_flag}')` but process forking   │
│   467 │   │   │   │   f" platform. We recommed `Trainer(strategy='ddp_spawn')` instead."         │
│   468 │   │   │   )                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: You selected `Trainer(strategy='ddp_fork')` but process forking is not supported on this platform. We 
recommed `Trainer(strategy='ddp_spawn')` instead.

Versions / Dependencies

py: 3.9 os: win 11 pro neuralforecast : 1.5.0 pandas: 1.5.3 numpy: 1.24.3

Issue Severity

High: It blocks me from completing my task.

jmoralez commented 7 months ago

Hey @batuhan3526, sorry for the late reply. The default for strategy in the Trainer is auto, which will try to use both of your GPUs. If you want to use only one you can try: