Open filipcacky opened 3 days ago
Hey, thanks a lot for the detailed report and sorry for the troubles. Initially I was using ctypes so everything was c arrays and when I changed to pybind11 I didn't properly update everything, I'll take some time to clean up this code.
@jmoralez Hey, i actually got a bit of time to take a look at it and refactored the code a bit :) Can't help with the O(n**4) situation though.
Awesome, thanks a lot! I'll review it shortly.
What happened
Fitting a SARIMA model with long seasonality and
include_mean==True
may cause invalid memory accesses and iterator overflows.The
void getQ0(const py::array_t<double> phiv, const py::array_t<double> thetav, py::array_t<double> resv)
(and all other c++ functions) usesint
as an iterator,std::vector<T>::size()
is always downcasted toint
.The function creates a vector of
size==nrbar
.This puts a limit on
seasonal_length
to about ~330, after which the integer overflows, making all code after UB. If it overflows to negative, an exception is called, if it overflows to positive, the code will eventually segfault on invalid accesses ininclu2
, which are not checked.I recommend removing all of the
static_cast<int>(vector::size())
, using size_t instead on all index/iterator variables.Using std containers instead of raw pointers, example:
Should really be:
While
std::vector::operator []
doesn't do bounds checking, most library implementations implement it viastd::vector::at
in debug builds, which does. This would have no negative performance implications.Also, is there any way the space complexity for
getQ0
could be reduced? ~O(n**4) seems a bit excessive, I'm not really familiar with the algos used.I would be happy to help with this myself 🙂 Although I'm a bit strapped for time, so it might take a while.
Versions / Dependencies
Any
statsforecast
version after commit 4bd4364 feat: migrate arima to c++ (#895)Reproducible example
Issue Severity
Low: It annoys or frustrates me.