JuliaDynamics / ARFIMA.jl

Simulate stochastic timeseries that follow ARFIMA, ARMA, ARIMA, AR, etc. processes
MIT License
51 stars 2 forks source link

Stranbo integration #11

Open gvdr opened 1 year ago

gvdr commented 1 year ago

Leaving an issue here to consider whether we might integrate the functionalities I have in Stranbo and arfima.

Stranbo is designed for generating time series with "complicated" sarimax (integer $i$) processes. Complicated means many seasonal effects (think, daily + weekly + monthly + yearly, ..., each one with its own d, ar, and ma coefficients) and/or many auxiliary components (again, each one with its own d, ar, and ma coefficients).

It is slower than ARFIMA, a little bit due to the generality, and a tad due to the fact that I don't use any @fastmath tricks (I got scared because of this kind of things, but maybe here it's safe?).

The high-picture view of how Stranbo work is:

  1. define all processes components as classes
  2. detine a process as an vector of components
  3. gather all coefficients, and expand the polynomials for x, z, and the auxiliary components using Polynomials
  4. For each x[t] 4.1 use a modified getindex to look in the past (modified so to handle negative indices without padding) 4.2 dot product coefficients and backward shifted (and truncated) series 4.3 sum up
  5. (eventually add anomalies...)

I believe that 4 is in mostly shared between the two packages (but I can learn a bit to improve even more). I don't think the polynomial stuff is in ARFIMA (I believe you have hardcoded the loops, right?).

Datseris commented 1 year ago

That all seems fine to me. They way I have done it so far does not require the existence of classes.

The default value for everything is nothing. If it is not nothing, it is a vector of coefficients. Multiple dispatch can differentiate between the two. Is there a particular reason to introduce new Types (Julia does not have classes, so I am guessing you mean Types in point 1). For example, to enable the "moving average" component of the process, you simply assign the "moving average" keyword the value of a Static Vector instead of nothing.

I don't think the polynomial stuff is in ARFIMA (I believe you have hardcoded the loops, right?).

I am not sure I understand what you mean by "hardcoded" (because I don't know what polynomial stuff means in this context). This is the main loop:

https://github.com/JuliaDynamics/ARFIMA.jl/blob/19722472ee8ad1d464f2e24388deeb7119a8d55c/src/ARFIMA.jl#L97-L102

does it count as "hardcoded"? How much I loop depends on the user input, so I am not sure whether this qualifies as hardcoded or not.

gvdr commented 1 year ago

The need for types (sorry) comes from trying to have multiple seasonality effects. Each seasonal component may have its own ar and ma coefficients. One $(2,0,3)s$ component, for example, will bring to the equation a $1-\phi_1B^s - \phi_2B^{2s}$ for the ar and a $1+\psi_1B^s + \psi_2B^{2s} + \psi_3B^{3*s}$ for the ma component (see here for example). If we want a sarima with weekly, monthly, and yearly seasonal components, we might in general need 6 new array of coefficients. An arima without seasonal component corresponds to a sarima with seasonality 1 in this fashion.

The most expressive way I could find to allow user to define potentially complicated sarimax was to define a building block type, and allowing for composition of these blocks. So, a sarima with seasonalities equal to 1, 4, 6, 9, would be written as:

this_sarima = [
sarima(ar = [...coefficients...], ma = [...coefficients...]),
sarima(ar = [...coefficients...], ma = [...coefficients...], s = 4),
sarima(ar = [...coefficients...], ma = [...coefficients...], s = 6),
sarima(ar = [...coefficients...], ma = [...coefficients...], s = 9),
]

where all the coefficients can be different (and in different number, even empty).

Datseris commented 1 year ago

Oh okay. I see. Then this means that the s component (for seasonal) should have dedicated supertype and subtypes, to handle all of this simpler. However, I stand that the AR or MA components don't need dedicated types, and one can use static vectors as I have been doing so far. Right?

gvdr commented 12 months ago

Absolutely right. I'm using static arrays to store the coefficients as well ☺️ (with a convenience constructor function that takes a regular vector and convert it to a SVector before calling the type constructor, so users don't need to deal directly with static vectors, unless they want to).

Datseris commented 12 months ago

Okay, so then what stops us from merging the two packages?

COuldn't we have something like a "main" function sarfimax that takes in s, ar, d, ma, x (with d for the integration), and just does everuthing?

gvdr commented 12 months ago

Something similar seems doable.

With the minor cavaet that, for the moment, in Stranbo the functions sarima / sarimax construct the components, they don't run the simulation (that is left to sample(VectorOfComponents)). It would be easy to a sarfimax constructor. I would need to define a sample method for the sarfimax type, which will rely on your code.

Does it make sense to have "component" which is fractionally integrated? Do fractional integration admits seasonality? That is, would the user want to write something like:

this_sarfimax = [
sarima(d = 1, ar = ..., ma = ...),
sarima(d = 1, ar = ..., ma = ..., s = 4),
sarfima(d = 0.3, ar = ..., ma = ..., s = 6)
]

sample(this_sarfimax, 100_000)

or would this be nonsensical?

Datseris commented 12 months ago

What's the advantage of not running the simulation and making the user go the extra step of calling sample? Doesn't this unecessarily complicate things? Is there anything else that is possible to do with this VectorOfComponents besides just getting the timeseries?


I don't think it makes sense to have d::Real with seasonal components. If d is not an integer and s is not nothing, we should throw an error.

gvdr commented 12 months ago

What's the advantage of not running the simulation and making the user go the extra step of calling sample? Doesn't this unecessarily complicate things?

I'm no sure. With an example, let's define a (1,1,1)x(1,1,0)5x(1,0,1)10. For me the choice seems to be between:

# my syntax
a_sarima_proc = [
sarima(d = 1, ar = [.8,.3,.2], ma = [.3,.1]), # non seasonal component
sarima(d = 1, ar = [.4], s = 5), # first seasonal effect
sarima(ar = [.5,.3], ma = [.4,.3,.1], s = 10) # second seasonal effect
]

sample(a_sarima_proc,100_000)

and

# standard syntax
sarima(
    d = [1,1,0],
    ar = [ [.8,.3,.2],  [.4], [.5,.3]],
    ma = [[.3,.1], nothing,  [.4,.3,.1]],
    s = [1,5,10]
)

For me the first is way more readable.

Is there anything else that is possible to do with this VectorOfComponents besides just getting the timeseries?

For one, it is easier to manipulate. For example, if I want to add a new seasonality effect I can just do:

new_effect = sarima(d = 2, ma = [0.4], s = 20)
new_sarima_proc = vcat(a_sarima_proc, new_effect)
sample(new_sarima_proc,100_000)

And there is some experimental feature I'm working out. So, it should be it is possible to do something like the following, to obtain 3 correlated series (eheh, I was not expecting it to just work).

using Distributions, LinearAlgebra

# we build a var-covar matrix
Σ = rand(3,3)
Σ = Σ .* Σ' 

# sample n points
WNoise = MvNormal([.0,.0,.0], I(3)+Σ)
z = rand(WNoise,100_000)

# define equivalent seasonality effects
seasonalities = [
    sarima(d = 1, ar = [.4], s = 5), # first seasonal effect
    sarima(ar = [.5,.3], ma = [.4,.3,.1], s = 10) # second seasonal effect
]

# define base processes

b1 = sarima(ar = [.5], ma = [.5], dₙ = z[1,:])
b2 = sarima(ar = [.5], ma = [.5], dₙ = z[2,:])
b3 = sarima(ar = [.5], ma = [.5], dₙ = z[3,:])

three_trajectories = sample.(
    [vcat(b1,seasonalities),
     vcat(b2,seasonalities),
     vcat(b3,seasonalities)],
     100_000
)

# try different seasonal processes

new_seasonalities = [
    sarima(ar = [.2], ma = [.1,.01,.001], s = 7), # first seasonal effect
    sarima(ar = [.5,.3], ma = [.4,.3,.1], s = 10) # second seasonal effect
]

other_three_trajectories = sample.(
    [vcat(b1,new_seasonalities),
     vcat(b2,new_seasonalities),
     vcat(b3,new_seasonalities)],
     100_000
)

I'm sure all of this can be done in other syntaxes, but I think this one is rather clear and easy to use.

Datseris commented 12 months ago

Okay, the vector syntax is fine.

But then, the only thing you can utilize for ARFIMA.jl is the fractional integration, right? But the fractional integration doesn't play well with seasonal. So the fractional integration will be used with vectors that always have only one component.

Can we define the convenience sample(::Component) dispatch as well?

And what is the plan forwards now? Where do we put all methods? Do we make a new package? SARFIMAX or something like that? Do we add the seasonal stuff here?

gvdr commented 12 months ago

Can we define the convenience sample(::Component) dispatch as well?

Sure, that's already there:

julia> using Stranbo
julia> test = sarima(ar = [.8], ma = [2.])
SARIMA{Float64}(1, 0, [0.8], [2.0], Distributions.Normal{Float64}(μ=0.0, σ=1.0))
julia> sample(test,1000)
1000-element Vector{Float64}:
 -0.3901653040936471
 -2.545535084952462
 -3.169123738677272
  1.3247596769491126
  1.1210945293432513
  1.7294814531680687
  4.1878387343420105
  ⋮
 -1.6600430819823186
 -4.6308613372374134
 -8.743526206444377
 -8.697082865177684
 -2.354005619615946
  0.013866600868192025

what is the plan forwards now? Where do we put all methods? Do we make a new package? SARFIMAX or something like that?

Open to any solution. Calling it sarfimax might mean we need to change it again if we include something else? But I'm terrible with names! :-D

the only thing you can utilize for ARFIMA.jl is the fractional integration, right?

That, and I should take a look closely at how you handle the bdp(). I've a similar operator, but I fear that is the performance bottleneck for me.

Datseris commented 12 months ago

How much of the source code here can you use? Can you add your seasonal functionality to this package? And we rename this package into AutoregressiveSimulations.jl and release it a-new? If you think using the source code here is too hard, we can start a new package AutoregressiveSimulations.jl from scratch.

package name is up to debate of course.

gvdr commented 11 months ago

Sorry, had very busy days, I'll get to this asap :-)