(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)
import pandas as pd
import numpy as np
from gluonts.dataset.pandas import PandasDataset
from gluonts.evaluation import make_evaluation_predictions
from gluonts.torch.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.evaluation import Evaluator
n = 100
freq = 'D'
date = "2015-04-07 00:00:00"
df = pd.DataFrame(np.random.randn(n).astype(np.float32), index=pd.period_range(date, periods=n, freq=freq))
dataset = PandasDataset(df, target=0)
model = SimpleFeedForwardEstimator(prediction_length=5, trainer_kwargs={"max_epochs": 1, "accelerator": "gpu"})
predictor = model.train(dataset)
forecast_it, ts_it = make_evaluation_predictions(
dataset, predictor=predictor, num_samples=100
)
# Error 1
next(forecast_it).quantile(0.5)
# Error 2
evaluator = Evaluator()
agg_metrics, item_metrics = evaluator(
ts_it, forecast_it,
)
Error message or code output
(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)
Error 1
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 next(forecast_it).quantile(0.5)
File /export/home/forecasting/aiops_tsf/benchmark_exp/pytorch_venv/lib/python3.9/site-packages/gluonts/torch/model/forecast.py:89, in DistributionForecast.quantile(self, level)
87 def quantile(self, level: Union[float, str]) -> np.ndarray:
88 level = Quantile.parse(level).value
---> 89 return self.distribution.icdf(torch.tensor([level])).cpu().numpy()
File /opt/conda/lib/python3.9/site-packages/torch/distributions/transformed_distribution.py:184, in TransformedDistribution.icdf(self, value)
179 def icdf(self, value):
180 """
181 Computes the inverse cumulative distribution function using
182 transform(s) and computing the score of the base distribution.
183 """
--> 184 value = self._monotonize_cdf(value)
185 value = self.base_dist.icdf(value)
186 for transform in self.transforms:
File /opt/conda/lib/python3.9/site-packages/torch/distributions/transformed_distribution.py:164, in TransformedDistribution._monotonize_cdf(self, value)
162 if isinstance(sign, int) and sign == 1:
163 return value
--> 164 return sign * (value - 0.5) + 0.5
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Error 2
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/opt/conda/lib/python3.9/multiprocessing/queues.py", line 368, in get
return _ForkingPickler.loads(res)
File "/opt/conda/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 120, in rebuild_cuda_tensor
torch.cuda._lazy_init()
File "/opt/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init
raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Environment
Operating system:
Python version:
GluonTS version: 0.12.1
MXNet version:
(Add as much information about your environment as possible, e.g. dependencies versions.)
Description
(A clear and concise description of what the bug is.) As per the title, DistributionForecast fails on GPU. There are 2 sources of error, firstly
https://github.com/awslabs/gluonts/blob/c5b64b4952a89f172c9c1e1fa9f1cfc0ee684a95/src/gluonts/torch/model/forecast.py#L89
torch.tensor([level])
will be on a different device compared to the parameters inself.distribution
Next, https://github.com/awslabs/gluonts/blob/c5b64b4952a89f172c9c1e1fa9f1cfc0ee684a95/src/gluonts/model/forecast_generator.py#L202
DistributionForecastGenerator
should usepredict_to_numpy
in line with Sample and Quantile generators.To Reproduce
(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)
Error message or code output
(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)
Error 1
Error 2
Environment
(Add as much information about your environment as possible, e.g. dependencies versions.)