Nixtla / statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.
https://nixtlaverse.nixtla.io/statsforecast
Apache License 2.0
3.98k stars 282 forks source link

divide by zero encountered in log return 0.5 *np.log(res) #434

Closed kiransview closed 1 year ago

kiransview commented 1 year ago

Getting this error on AutoARIMA model implementation have some 25000 datapoints over 12 months of data.

speedyb74 commented 1 year ago

Hi kiransview,

I have just gotten same error message. When digging deeper and searching in other issue postings regarding zero division, I found this: https://github.com/Nixtla/statsforecast/issues/182

It has solved my issue.

BR

iamyihwa commented 1 year ago

Above hack didn't solve my issue. I am having similar error, that says "ZeroDivisionError: division by zero"

The error seems to have occurred here : File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/core.py:940, in _StatsForecast._fit_parallel(self) 938 future = executor.apply_async(ga.fit, (self.models,)) 939 futures.append(future) --> 940 fm = np.vstack([f.get() for f in futures]) 941 return fm

Complete error message is below:

/anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py:896: RuntimeWarning: divide by zero encountered in log return 0.5 * np.log(res)

RemoteTraceback Traceback (most recent call last) RemoteTraceback: """ Traceback (most recent call last): File "/anaconda/envs/jupyter_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(args, kwds)) File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/core.py", line 73, in fit fm[i, i_model] = new_model.fit(y=y, X=X) File "/anaconda/envs/jupyterenv/lib/python3.8/site-packages/statsforecast/models.py", line 246, in fit self.model = auto_arima_f( File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py", line 2140, in auto_arima_f k, bestfit, improved = try_params( File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py", line 2112, in try_params fit = p_myarima( File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py", line 1237, in myarima fit = arima( File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py", line 931, in arima res = minimize( File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_minimize.py", line 687, in minimize res = _minimize_bfgs(fun, x0, args, jac, callback, options) File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 1322, in _minimize_bfgs _line_search_wolfe12(f, myfprime, xk, pk, gfk, File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 1100, in _line_search_wolfe12 ret = line_search_wolfe1(f, fprime, xk, pk, gfk, File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_linesearch.py", line 84, in line_search_wolfe1 stp, fval, old_fval = scalar_search_wolfe1( File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_linesearch.py", line 160, in scalar_search_wolfe1 phi1 = phi(stp) File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_linesearch.py", line 75, in phi return f(xk + spk, args) File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 267, in fun self._update_fun() File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 251, in _update_fun self._update_fun_impl() File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 155, in update_fun self.f = fun_wrapped(self.x) File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in fun_wrapped fx = fun(np.copy(x), args) File "/anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py", line 894, in arma_css_op res, resid = arima_css(x, arma, phi, theta, ncond) ZeroDivisionError: division by zero """

The above exception was the direct cause of the following exception:

ZeroDivisionError Traceback (most recent call last) Cell In[4], line 41 28 sf= StatsForecast( 29 # _short, 30 models=models, 31 freq=cur_freq, 32 n_jobs=-1 33 ) 36 #https://github.com/Nixtla/statsforecast/issues/182 37 #constant = 10 38 # add constant to avoid errors 39 #y_train['y'] += constant ---> 41 fcst = sf.fit(df=y_train) 42 y_hat = fcst.predict(h=fh_len) # , level=[90]) 43 #y_hat[['AutoARIMA', 'AutoETS', 'Naive', 'SeasonalNaive']] -= constant 44 #forecast_all = fcst.merge(fcst, how = 'left', on = ['unique_id', 'ds'])

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/core.py:581, in _StatsForecast.fit(self, df, sortdf) 579 self.fitted = self.ga.fit(models=self.models) 580 else: --> 581 self.fitted_ = self._fit_parallel() 582 return self

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/core.py:940, in _StatsForecast._fit_parallel(self) 938 future = executor.apply_async(ga.fit, (self.models,)) 939 futures.append(future) --> 940 fm = np.vstack([f.get() for f in futures]) 941 return fm

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/core.py:940, in (.0) 938 future = executor.apply_async(ga.fit, (self.models,)) 939 futures.append(future) --> 940 fm = np.vstack([f.get() for f in futures]) 941 return fm

File /anaconda/envs/jupyter_env/lib/python3.8/multiprocessing/pool.py:771, in ApplyResult.get(self, timeout) 769 return self._value 770 else: --> 771 raise self._value

AzulGarza commented 1 year ago

hey @kiransview and @iamyihwa! Would you happen to have an example to reproduce the error? It could be a small dataset.

Thanks for pointing out a possible solution @speedyb74. :)

AzulGarza commented 1 year ago

Example provided by @iamyihwa in #182:

data = {'unique_id':   [1]*139, 
         'ds': pd.date_range(start = '2019-01-01', end = '2021-09-01', freq = 'W'),
         'y':  np.array([0.0]*4 + [19.68] + [0.0]*134) } 
test_df = pd.DataFrame(data)

fcst = StatsForecast(df = test_df, 
                    models=[ AutoARIMA(season_length = season_length)
                    ], 
                    freq= cur_freq, 
                    n_jobs=-1)
Y_hat_df = fcst.forecast(h=horizon_length, fitted=True)
iamyihwa commented 1 year ago

@FedericoGarza I have tried this hack (adding constant before the prediction , and later subtracting it)and still getting the same warning but prediction is being made.

Warning generated: /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py:896: RuntimeWarning: divide by zero encountered in log return 0.5 * np.log(res)

iamyihwa commented 1 year ago

However in some cases, this doesn't solve all the issues with the sparse dataset.

from statsforecast.models import AutoETS, AutoARIMA, Naive, SeasonalNaive
temp_df = pd.DataFrame({'unique_id': [1]*32, 'ds': pd.date_range(start = '1919-04-30', periods = 32, freq = 'M'), 
               'values': [ 0]*30 +  [1.0, 158.0]})

fcst = StatsForecast(df = temp_df, 
                    models=[ 
                    AutoARIMA(season_length = season_length),
                    #AutoETS(season_length = 12)
                    ], 
                    freq='M', 
                    #fallback_model = Naive, 

Error message:


ZeroDivisionError Traceback (most recent call last) Cell In[52], line 14 5 #temp_df['y'] += 10 6 fcst = StatsForecast(df = temp_df, 7 models=[ 8 AutoARIMA(season_length = season_length), (...) 12 #fallback_model = Naive, 13 n_jobs=-1) ---> 14 fcst.fit()

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/core.py:579, in _StatsForecast.fit(self, df, sort_df) 577 self._prepare_fit(df, sort_df) 578 if self.njobs == 1: --> 579 self.fitted = self.ga.fit(models=self.models) 580 else: 581 self.fitted_ = self._fit_parallel()

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/core.py:73, in GroupedArray.fit(self, models) 71 for i_model, model in enumerate(models): 72 new_model = model.new() ---> 73 fm[i, i_model] = new_model.fit(y=y, X=X) 74 return fm

File /anaconda/envs/jupyterenv/lib/python3.8/site-packages/statsforecast/models.py:246, in AutoARIMA.fit(self, y, X) 228 """Fit the AutoARIMA model. 229 230 Fit an AutoARIMA to a time series (numpy array) y (...) 243 AutoARIMA fitted model. 244 """ 245 with np.errstate(invalid="ignore"): --> 246 self.model = auto_arima_f( 247 x=y, 248 d=self.d, 249 D=self.D, 250 max_p=self.max_p, 251 max_q=self.max_q, 252 max_P=self.max_P, 253 max_Q=self.max_Q, 254 max_order=self.max_order, 255 max_d=self.max_d, 256 max_D=self.max_D, 257 start_p=self.start_p, 258 start_q=self.start_q, 259 start_P=self.start_P, 260 start_Q=self.start_Q, 261 stationary=self.stationary, 262 seasonal=self.seasonal, 263 ic=self.ic, 264 stepwise=self.stepwise, 265 nmodels=self.nmodels, 266 trace=self.trace, 267 approximation=self.approximation, 268 method=self.method, 269 truncate=self.truncate, 270 xreg=X, 271 test=self.test, 272 test_kwargs=self.test_kwargs, 273 seasonal_test=self.seasonal_test, 274 seasonal_test_kwargs=self.seasonal_test_kwargs, 275 allowdrift=self.allowdrift, 276 allowmean=self.allowmean, 277 blambda=self.blambda, 278 biasadj=self.biasadj, 279 parallel=self.parallel, 280 num_cores=self.num_cores, 281 period=self.season_length, 282 ) 283 return self

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py:2070, in auto_arima_f(x, d, D, max_p, max_q, max_P, max_Q, max_order, max_d, max_D, start_p, start_q, start_P, start_Q, stationary, seasonal, ic, stepwise, nmodels, trace, approximation, method, truncate, xreg, test, test_kwargs, seasonal_test, seasonal_test_kwargs, allowdrift, allowmean, blambda, biasadj, parallel, numcores, period) 2068 p = int(maxp > 0) 2069 P = int(m > 1 and max_P > 0) -> 2070 fit = pmyarima( 2071 order=(p, d, 0), 2072 seasonal={"order": (P, D, 0), "period": m}, 2073 ) 2074 results[k + 1] = (p, d, 0, P_, D, 0, constant, fit["ic"]) 2075 if fit["ic"] < bestfit["ic"]:

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py:1237, in myarima(x, order, seasonal, constant, ic, trace, approximation, offset, xreg, method, **kwargs) 1235 else: 1236 if use_season: -> 1237 fit = arima( 1238 x, order, seasonal, include_mean=constant, method=method, xreg=xreg 1239 ) 1240 else: 1241 fit = arima(x, order, include_mean=constant, method=method, xreg=xreg)

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py:959, in arima(x, order, seasonal, xreg, include_mean, transform_pars, fixed, init, method, SSinit, optim_method, kappa, tol, optim_control) 957 init[ind] = maInvert(init[ind]) 958 trarma = arima_transpar(init, arma, transform_pars) --> 959 mod = make_arima(trarma[0], trarma[1], Delta, kappa, SSinit) 960 if no_optim: 961 res = OptimResult( 962 True, 963 0, (...) 966 np.array([]), 967 )

File /anaconda/envs/jupyter_env/lib/python3.8/site-packages/statsforecast/arima.py:422, in make_arima(phi, theta, delta, kappa, tol) 420 def make_arima(phi, theta, delta, kappa=1e6, tol=np.finfo(np.float64).eps): 421 keys = ["phi", "theta", "delta", "Z", "a", "P", "T", "V", "h", "Pn"] --> 422 res = _make_arima(phi, theta, delta, kappa, tol) 423 return dict(zip(keys, res))

ZeroDivisionError: division by zero

Same code with constnt added gives same error


from statsforecast.models import AutoETS, AutoARIMA, Naive, SeasonalNaive
temp_df = pd.DataFrame({'unique_id': [1]*32, 'ds': pd.date_range(start = '1919-04-30', periods = 32, freq = 'M'), 
               'values': [ 0]*30 +  [1.0, 158.0]})

temp_df['y'] += 10 
fcst = StatsForecast(df = temp_df, 
                    models=[ 
                    AutoARIMA(season_length = season_length),
                    #AutoETS(season_length = 12)
                    ], 
                    freq='M', 
                    #fallback_model = Naive, 
                    n_jobs=-1)
fcst.fit() 
iamyihwa commented 1 year ago

The problem seems to occur with data points of different orders of magnitude.

Doesn't happen when two points are of same order.

image

Happens when two points are of different order of magnitude.

image

Saw AutoETS failing also even when the dataset was not sparse, but when the y values were of different order.