facebook / prophet

Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.
https://facebook.github.io/prophet
MIT License
18.01k stars 4.48k forks source link

[BUG] Zero Division in diagnostics.performance_metrics() causing failed assertion #2577

Closed ThomasChia closed 1 month ago

ThomasChia commented 2 months ago

Issue

When calculating the SMAPE metric in the diagnostics.performance_metrics() function, there is the possibility for a zero division if y and yhat are both zero.

def smape(df, w):
    """Symmetric mean absolute percentage error
    based on Chen and Yang (2004) formula

    Parameters
    ----------
    df: Cross-validation results dataframe.
    w: Aggregation window size.

    Returns
    -------
    Dataframe with columns horizon and smape.
    """
    sape = np.abs(df['y'] - df['yhat']) / ((np.abs(df['y']) + np.abs(df['yhat'])) / 2)    <---- POSSIBLE ZERO DIVISION
    if w < 0:
        return pd.DataFrame({'horizon': df['horizon'], 'smape': sape})
    return rolling_mean_by_h(
        x=sape.values, h=df['horizon'].values, w=w, name='smape'
    )

This does not cause an error directly, however, it results in np.nan values where zero division occurs. When the rolling_mean_by_h() function is called, there is a groupby() which removes any np.nan values. This becomes an issue in the main performance_metrics() function with the following assert:

assert np.array_equal(res['horizon'].values, res_m['horizon'].values)

This is part of a loop that checks each of the metrics ensuring that they are the same length and fails given the above scenario, as np.nan values are removed and that metric returns fewer values.

Replication

Here is how you can replicate this issue:

import pandas as pd
from prophet import Prophet
from prophet.diagnostics import cross_validation, performance_metrics

df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv')

df['ds'] = pd.to_datetime(df['ds'])
df.loc[df['ds'].dt.dayofweek == 6, 'y'] = 0

m = Prophet()
m.fit(df)

df_cv = cross_validation(m, '365 days', initial='1825 days', period='365 days')
df_cv['yhat'] = df_cv['yhat'].clip(lower=0)
metrics = performance_metrics(df_cv)

We set certain values in the training data to zero and clip negative values to create a scenario where y and yhat are both zero.