[Loss] Poor performance with the NegativeBinomial DistributionLoss

What happened + What you expected to happen

I suspect a bug around the binomial negative. Indeed, performance seems to be off compared with other available distributions, even when faced with positive count data on which it is supposed to be efficient.

Perhaps a conflict with the way the input data is scaled? I know that on Pytorch-Forecasting, they block the use of negative binomial when applying centered normalization: https://pytorch-forecasting.readthedocs.io/en/stable/_modules/pytorch_forecasting/metrics/distributions.html#NegativeBinomialDistributionLoss

I can't share the results on my data, but I've coded a quick example that illustrates the problem.

Versions / Dependencies

neuralforecast==1.7.4 torch==2.3.1+cu121

Reproduction script

import pandas as pd
import numpy as np 
import itertools

from neuralforecast import NeuralForecast
from neuralforecast.models import DeepAR, TFT, NHITS
from neuralforecast.losses.pytorch import DistributionLoss
from neuralforecast.losses.numpy import mae
from neuralforecast.utils import AirPassengersPanel

Y_df = AirPassengersPanel

nf = NeuralForecast(
    models=[
        eval(model)(
            h=12,
            input_size=48,
            max_steps=100,
            scaler_type="robust",
            loss=DistributionLoss(distr, level=[]),
            alias=f"{model}-{distr}",
            enable_model_summary=False,
            enable_checkpointing=False,
            enable_progress_bar=False,
            logger=False
        )
        for model, distr in itertools.product(
            ["DeepAR", "TFT", "NHITS"], ["Poisson", "Normal", "StudentT", "NegativeBinomial"]
        )
    ],
    freq="M"
)
cv_df = nf.cross_validation(Y_df, n_windows=5, step_size=12).reset_index();

def evaluate(df):
    eval_ = {}
    df = df.merge(Y_df[["unique_id", "ds", "y_[lag12]"]], how="left").rename(columns={"y_[lag12]": "seasonal_naive"})
    models = ["seasonal_naive"] + list(df.columns[df.columns.str.contains('median')])
    for model in models:
        eval_[model] = {}
        eval_[model][mae.__name__] = int(np.round(mae(df['y'].values, df[model].values), 0))
    eval_df = pd.DataFrame(eval_).rename_axis('metric')
    return eval_df

cv_df.groupby('cutoff').apply(lambda df: evaluate(df))

Output:

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Hello @jmoralez and @cchallu,

I'm bringing this up again because it's becoming a sticking point for me: I need to get output samples and not quantiles. And in my field we're dealing with "count data" (i.e. positive integers), and historically we've played a lot with Tweedie and NegativeBinomial :( I've tried to identify the problem by also looking at NBMM, but it seems to be facing the same problem overall. In my opinion it looks to be correlated to the scaling of the data in some way, as the results are even more catastrophic compared to other distributions with scaler="identity" (with NHITS for example).

If you even have a hunch, I could take the time to deep dive if need be.

Thanks in advance!

Nixtla / neuralforecast