Nixtla / statsforecast

Lightning ⚡️ fast forecasting with statistical and econometric models.
https://nixtlaverse.nixtla.io/statsforecast
Apache License 2.0
3.99k stars 283 forks source link

[ARIMA] Invalid memory accesses, integer overflows #937

Open filipcacky opened 3 days ago

filipcacky commented 3 days ago

What happened

Fitting a SARIMA model with long seasonality and include_mean==True may cause invalid memory accesses and iterator overflows.

The void getQ0(const py::array_t<double> phiv, const py::array_t<double> thetav, py::array_t<double> resv) (and all other c++ functions) uses int as an iterator, std::vector<T>::size() is always downcasted to int.

The function creates a vector of size==nrbar.

  int r = std::max(p, q + 1);
  int np = r * (r + 1) / 2;
  int nrbar = np * (np - 1) / 2;

This puts a limit on seasonal_length to about ~330, after which the integer overflows, making all code after UB. If it overflows to negative, an exception is called, if it overflows to positive, the code will eventually segfault on invalid accesses in inclu2, which are not checked.

I recommend removing all of the static_cast<int>(vector::size()), using size_t instead on all index/iterator variables.

Using std containers instead of raw pointers, example:

void inclu2(int np, const double *xnext, double *xrow, double ynext, double *d, double *rbar, double *thetab);

Should really be:

void inclu2(const std::vector<double>& xnext, std::vector<double>& xrow, double ynext, std::span<double> d, std::vector<double>& rbar, std::vector<double>& thetab) {
  const size_t np = xnext.size();
  ASSERT(np == xrow.size());
  ...
}

While std::vector::operator [] doesn't do bounds checking, most library implementations implement it via std::vector::at in debug builds, which does. This would have no negative performance implications.

Also, is there any way the space complexity for getQ0 could be reduced? ~O(n**4) seems a bit excessive, I'm not really familiar with the algos used.

I would be happy to help with this myself 🙂 Although I'm a bit strapped for time, so it might take a while.

Versions / Dependencies

Any statsforecast version after commit 4bd4364 feat: migrate arima to c++ (#895)

Reproducible example

import numpy as np
from statsforecast.models import ARIMA

model = ARIMA(
    order=(1, 0, 0),
    seasonal_order=(1, 0, 0),
    season_length=660,
    include_mean=True,
)

model.fit(np.zeros(12000))

Issue Severity

Low: It annoys or frustrates me.

jmoralez commented 3 days ago

Hey, thanks a lot for the detailed report and sorry for the troubles. Initially I was using ctypes so everything was c arrays and when I changed to pybind11 I didn't properly update everything, I'll take some time to clean up this code.

filipcacky commented 2 days ago

@jmoralez Hey, i actually got a bit of time to take a look at it and refactored the code a bit :) Can't help with the O(n**4) situation though.

jmoralez commented 2 days ago

Awesome, thanks a lot! I'll review it shortly.