Nixtla / neuralforecast

Scalable and user friendly neural :brain: forecasting algorithms.
https://nixtlaverse.nixtla.io/neuralforecast
Apache License 2.0
2.93k stars 336 forks source link

[Model] Getting mat1 and mat2 shapes cannot be multiplied error for model StemGNN #869

Closed iamyihwa closed 4 months ago

iamyihwa commented 7 months ago

What happened + What you expected to happen

Hi, I was doing StemGNN fit(), I am getting an error that says mat1 and mat2 shapes cannot be multiplied . If I run NHITS it runs fine. (Same train data, val_size = 13).

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap fn(i, args) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/multiprocessing.py", line 170, in _wrapping_function results = function(args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl self._run(model, ckpt_path=ckpt_path) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run results = self._run_stage() File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage self._run_sanity_check() File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1062, in _run_sanity_check val_loop.run() File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 182, in _decorator return loop_run(self, *args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 134, in run self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 391, in _evaluation_step output = call._call_strategy_hook(trainer, hook_name, step_args) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook output = fn(args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 402, in validation_step return self._forward_redirection(self.model, self.lightning_module, "validation_step", *args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 633, in call wrapper_output = wrapper_module(*args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward else self._run_ddp_forward(*inputs, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward return self.module(*inputs, *kwargs) # type: ignore[index] File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 626, in wrapped_forward out = method(*_args, *_kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/neuralforecast/common/_base_multivariate.py", line 450, in validation_step output = self(windows_batch) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/neuralforecast/models/stemgnn.py", line 332, in forward mul_L, attention = self.latent_correlation_layer(x) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/neuralforecast/models/stemgnn.py", line 299, in latent_correlation_layer attention = self.self_graph_attention(input) File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-44f2d3db-9084-4f52-b993-91a01d4b6581/lib/python3.10/site-packages/neuralforecast/models/stemgnn.py", line 315, in self_graph_attention key = torch.matmul(input, self.weight_key) RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x791 and 2x1)


### Versions / Dependencies

1.6.4

### Reproduction script

nf = NeuralForecast( models= [ StemGNN(h=horizon, input_size=2*horizon, n_series=2,

stat_exog_list=['airline1'],

                #futr_exog_list=['trend'],
                scaler_type='robust',
                max_steps=200,
                early_stop_patience_steps=-1,
                val_check_steps=10,
                learning_rate=1e-3,
                loss=RMSE(), # MAE(),
                valid_loss=None,
                batch_size=32
                        )
              ],
freq= '4W-SAT'

)

nf.fit(train_df, val_size = 13 )



### Issue Severity

None
iamyihwa commented 7 months ago

By changing n_series into number of unique time series (train_df['unique_id'].nunique() ), mat1 and mat2 shapes cannot be multiplied (1581x791 and 1581x1)

the first dimension has changed, however the mismatch in the second dimension remains the same ..

elephaint commented 5 months ago

Hi,

n_series should indeed be the number of unique series in the dataset. Could you check whether this issue still persists on the latest version (1.7.1)?

If not, can you provide a full working example that reproduces the issue?

Teculos commented 1 month ago

I'm seeing the same issue with the Auto* models and have a minimum viable reproduction here. I am reducing the ETTm2 dataset size for memory consumption reasons specific to the system I'm using, I don't think this should affect results though.

I'm also getting a similar issue with TSMixer and have included it in my example along with NHITS which does not have an issue.


import optuna
import pandas as pd

from neuralforecast import NeuralForecast
from neuralforecast.losses.pytorch import MAE
from neuralforecast.auto import AutoStemGNN, AutoTSMixer, AutoNHITS, AutoMLPMultivariate

from datasetsforecast.long_horizon import LongHorizon

# Change this to your own data to try the model
Y_df, _, _ = LongHorizon.load(directory='./', group='ETTm2')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df = Y_df[["ds", "unique_id","y"]]

#need to reduce this for memory reasons on the system im using (should not affect reproduction of issue)
Y_df = Y_df[Y_df.ds <= Y_df.ds.median()]
Y_df = Y_df[Y_df.ds <= Y_df.ds.median()]

H=96
num_samples=10
num_gpus=1

nhits_default_config = AutoNHITS.get_default_config(h=H, backend="optuna")
tsmixer_default_config = AutoTSMixer.get_default_config(h=H, backend="optuna", n_series=Y_df["unique_id"].nunique())
stemgnn_default_config = AutoStemGNN.get_default_config(h=H, backend="optuna", n_series=Y_df["unique_id"].nunique())
mlp_default_config = AutoMLPMultivariate.get_default_config(h=H, backend="optuna", n_series=Y_df["unique_id"].nunique())

models = [AutoNHITS(h=H,
                    config=nhits_default_config,
                    gpus=num_gpus,
                    valid_loss=MAE(),
                    search_alg=optuna.samplers.TPESampler(),
                    backend="optuna",
                    num_samples=num_samples),
        AutoTSMixer(h=H,
                    n_series=Y_df["unique_id"].nunique(),
                    config=tsmixer_default_config,
                    gpus=num_gpus,
                    valid_loss=MAE(),
                    search_alg=optuna.samplers.TPESampler(),
                    backend='optuna',
                    num_samples=num_samples),
          AutoStemGNN(h=H,
                    n_series=Y_df["unique_id"].nunique(),
                    config=stemgnn_default_config,
                    gpus=num_gpus,
                    valid_loss=MAE(),
                    search_alg=optuna.samplers.TPESampler(),
                    backend='optuna',
                    num_samples=num_samples),
          AutoMLPMultivariate(h=H,
                    n_series=Y_df["unique_id"].nunique(),
                    config=mlp_default_config,
                    gpus=num_gpus,
                    valid_loss=MAE(),
                    search_alg=optuna.samplers.TPESampler(),
                    backend='optuna',
                    num_samples=num_samples)
    ]

nf = NeuralForecast(models=[models[3]], freq="15min")

Y_df.sort_values("ds", inplace=True)
nf.fit(df=Y_df)

I also reproduced these errors with the base models.


from neuralforecast.models import StemGNN, TSMixer, NHITS, MLPMultivariate

models = [MLPMultivariate(h=H,
                input_size=H,
                n_series=Y_df["unique_id"].nunique(),
                batch_size = 25,
                max_steps = 100,
                loss=MAE()),
          NHITS(h=H,
                input_size=H,
                windows_batch_size = 25,
                max_steps = 100,
                loss=MAE()),
          TSMixer(h=H,
                  input_size=H,
                  batch_size = 25,
                  max_steps = 100,
                  n_series=Y_df["unique_id"].nunique(),
                  loss=MAE()),
          StemGNN(h=H,
                  input_size=H,
                  batch_size = 25,
                  max_steps = 100,
                  n_series=Y_df["unique_id"].nunique(),
                  loss=MAE())]

nf = NeuralForecast(models=[models[0]], freq="15min")

Y_df.sort_values("ds", inplace=True)
nf.fit(df=Y_df)

neuralforecast is 1.7.3

Issue Severity High: It blocks me from completing my task.

EDIT: Added MLPMultivariate to both examples, seems as though all multivariate models are suffering from the same error