DistributionForecast fails on GPU

Description

(A clear and concise description of what the bug is.) As per the title, DistributionForecast fails on GPU. There are 2 sources of error, firstly

https://github.com/awslabs/gluonts/blob/c5b64b4952a89f172c9c1e1fa9f1cfc0ee684a95/src/gluonts/torch/model/forecast.py#L89

torch.tensor([level]) will be on a different device compared to the parameters in self.distribution

Next, https://github.com/awslabs/gluonts/blob/c5b64b4952a89f172c9c1e1fa9f1cfc0ee684a95/src/gluonts/model/forecast_generator.py#L202 DistributionForecastGenerator should use predict_to_numpy in line with Sample and Quantile generators.

To Reproduce

(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)

import pandas as pd
import numpy as np
from gluonts.dataset.pandas import PandasDataset
from gluonts.evaluation import make_evaluation_predictions
from gluonts.torch.model.simple_feedforward import SimpleFeedForwardEstimator
from gluonts.evaluation import Evaluator

n = 100
freq = 'D'
date = "2015-04-07 00:00:00"

df = pd.DataFrame(np.random.randn(n).astype(np.float32), index=pd.period_range(date, periods=n, freq=freq))
dataset = PandasDataset(df, target=0)

model = SimpleFeedForwardEstimator(prediction_length=5, trainer_kwargs={"max_epochs": 1, "accelerator": "gpu"})
predictor = model.train(dataset)

forecast_it, ts_it = make_evaluation_predictions(
    dataset, predictor=predictor, num_samples=100
)

# Error 1
next(forecast_it).quantile(0.5)

# Error 2
evaluator = Evaluator()
agg_metrics, item_metrics = evaluator(
    ts_it, forecast_it,
)

Error message or code output

(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

Error 1

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 next(forecast_it).quantile(0.5)

File /export/home/forecasting/aiops_tsf/benchmark_exp/pytorch_venv/lib/python3.9/site-packages/gluonts/torch/model/forecast.py:89, in DistributionForecast.quantile(self, level)
     87 def quantile(self, level: Union[float, str]) -> np.ndarray:
     88     level = Quantile.parse(level).value
---> 89     return self.distribution.icdf(torch.tensor([level])).cpu().numpy()

File /opt/conda/lib/python3.9/site-packages/torch/distributions/transformed_distribution.py:184, in TransformedDistribution.icdf(self, value)
    179 def icdf(self, value):
    180     """
    181     Computes the inverse cumulative distribution function using
    182     transform(s) and computing the score of the base distribution.
    183     """
--> 184     value = self._monotonize_cdf(value)
    185     value = self.base_dist.icdf(value)
    186     for transform in self.transforms:

File /opt/conda/lib/python3.9/site-packages/torch/distributions/transformed_distribution.py:164, in TransformedDistribution._monotonize_cdf(self, value)
    162 if isinstance(sign, int) and sign == 1:
    163     return value
--> 164 return sign * (value - 0.5) + 0.5

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Error 2

Traceback (most recent call last):
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.9/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/opt/conda/lib/python3.9/multiprocessing/queues.py", line 368, in get
    return _ForkingPickler.loads(res)
  File "/opt/conda/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 120, in rebuild_cuda_tensor
    torch.cuda._lazy_init()
  File "/opt/conda/lib/python3.9/site-packages/torch/cuda/__init__.py", line 217, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Environment

Operating system:
Python version:
GluonTS version: 0.12.1
MXNet version:

(Add as much information about your environment as possible, e.g. dependencies versions.)

awslabs / gluonts