Open papamarkou opened 10 years ago
I've actually been poking at this stuff over the past couple of days, for the first time in a long while...
R's arima() function is implemented as a state-space model, which allows it to deal with series with missing data. That was part of my motivation for doing the Kalman filter last year. The ar() function in R is implemented differently, and gives you a choice of a few fitting algorithms. I've got some skeleton code for constructing and fitting ARIMA models as state-space models via maximum likelihood, which I can push tonight after work.
Definitely worth discussing the big-picture approach to take. I'm used to R's time-series modeling, so that's my point of reference.
Sam
On Mon, Feb 10, 2014 at 12:38 PM, Theodore Papamarkou < notifications@github.com> wrote:
Hi @milktrader https://github.com/milktrader. I am sorry I haven't had much time to get more actively involved in TimeModels. I wanted to ask you what would be the best way to create a model hierarchy in order to implement ARIMA. This won't be useful only to those who work with time series, it will be very beneficial for MCMC too given that several summary statistics for MCMC rely on AR(p) models.
I would be happy to try implement ARIMA or to help with it. Do you have any recommendations from which language we could use some code as prototype for ARIMA or maybe at least AR?
Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/TimeModels.jl/issues/15 .
Hi! Of course R implementation was my first thought too, but you also can look to python implementation: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/arima_model.py
Also I'm sorry I hadn't time to merge my garch code before, I think I'll have a time to do it this week finally
That's great @ElOceanografo, @AndreyKolev, it would be amazing if you manage to merge arima and/or garch! There are a few points for discussion, for example if we are going to deal with missing values (I imagine yes), and therefore if we are going to work with data arrays or time arrays or float64 arrays, but these are not so urgent. As long as we have some Julia code for arima or garch to work with, we can easily change these later.
Possibly ping @carljv about model hierarchy.
Heads up on TimeSeries. It no longer supports DataFrames/DataArrays but instead has implemented the TimeArray type.
immutable TimeArray{T,N} <: AbstractTimeArray
timestamp::Vector{Date{ISOCalendar}}
values::Array{T,N}
colnames::Vector{ASCIIString}
function TimeArray(timestamp::Vector{Date{ISOCalendar}}, values::Array{T,N}, colnames::Vector{ASCIIString})
nrow, ncol = size(values, 1), size(values, 2)
nrow != size(timestamp, 1) ? error("values must match length of timestamp"):
ncol != size(colnames,1) ? error("column names must match width of array"):
timestamp != unique(timestamp) ? error("there are duplicate dates"):
~(flipud(timestamp) == sort(timestamp) || timestamp == sort(timestamp)) ? error("dates are mangled"):
flipud(timestamp) == sort(timestamp) ?
new(flipud(timestamp), flipud(values), colnames):
new(timestamp, values, colnames)
end
end
The inner constructor looks like a lot of code but simply enforces obvious invariants and orders the data from youngest to oldest. Overall, the design's goals are simplicity speed. It would be interesting to take it through the paces with time models and see where the weaknesses are.
Thanks @milktrader. Two questions, as I look at the TimeArray
definition only now for the first time:
timestamp
field hold? If you can provide a very simple example of creating a timearray, this will help us use it in TimeModels
.Cheers!
It does not yet deal with missing values. That missing piece is being explored in the DataArrays branch. The creation of a TimeArray is straightforward.
julia> mydates = [today():days(1):date(2014,3,1)]
20-element Array{Date{ISOCalendar},1}:
2014-02-10
2014-02-11
2014-02-12
2014-02-13
2014-02-14
2014-02-15
⋮
2014-02-24
2014-02-25
2014-02-26
2014-02-27
2014-02-28
2014-03-01
julia> myarray = [rand(20) rand(20)];
julia> mycolnames = ["r1", "r2"];
julia> mytimearray = TimeArray(mydates, myarray, mycolnames)
20x2 TimeArray{Float64,2} 2014-02-10 to 2014-03-01
r1 r2
2014-02-10 | 0.58 0.13
2014-02-11 | 0.81 0.64
2014-02-12 | 0.18 0.07
2014-02-13 | 0.54 0.07
...
2014-02-25 | 0.42 0.99
2014-02-26 | 0.49 0.47
2014-02-27 | 0.77 0.19
2014-02-28 | 0.97 0.55
2014-03-01 | 0.45 0.72
The show
method is still brittle and doesn't support long column names.
That's really helpful. One more question; is there any constructor that allows leaving the timestamp empty or to some default value if the user of a time series model is not so much interested in the timestamp
field, but mostly or only wants to look at the values
field from a modelling point of view?
If you don't mind cloning the MarketData
package you get some const
objects to play with.
julia> Pkg.clone("git://github.com/JuliaQuant/MarketData.jl.git")
julia> using MarketData
julia> OHLC
16103x6 TimeArray{Float64,2} 1950-01-03 to 2013-12-31
Open High Low Close Volume Adj Cl
1950-01-03 | 16.66 16.66 16.66 16.66 1260000.00 16.66
1950-01-04 | 16.85 16.85 16.85 16.85 1890000.00 16.85
1950-01-05 | 16.93 16.93 16.93 16.93 2550000.00 16.93
1950-01-06 | 16.98 16.98 16.98 16.98 2010000.00 16.98
...
2013-12-24 | 1828.02 1833.32 1828.02 1833.32 1307630000.00 1833.32
2013-12-26 | 1834.96 1842.84 1834.96 1842.02 1982270000.00 1842.02
2013-12-27 | 1842.97 1844.89 1839.81 1841.40 2052920000.00 1841.40
2013-12-30 | 1841.47 1842.47 1838.77 1841.07 0.00 1841.07
2013-12-31 | 1842.61 1849.44 1842.41 1848.36 2312840000.00 1848.36
Yeah, the show
method needs some work
Dropping down to just an array is simple
julia> OHLC.values
16103x6 Array{Float64,2}:
16.66 16.66 16.66 16.66 1.26e6 16.66
16.85 16.85 16.85 16.85 1.89e6 16.85
16.93 16.93 16.93 16.93 2.55e6 16.93
16.98 16.98 16.98 16.98 2.01e6 16.98
17.08 17.08 17.08 17.08 2.52e6 17.08
17.03 17.03 17.03 17.03 2.16e6 17.03
⋮ ⋮
1822.92 1829.75 1822.92 1827.99 2.85154e9 1827.99
1828.02 1833.32 1828.02 1833.32 1.30763e9 1833.32
1834.96 1842.84 1834.96 1842.02 1.98227e9 1842.02
1842.97 1844.89 1839.81 1841.4 2.05292e9 1841.4
1841.47 1842.47 1838.77 1841.07 0.0 1841.07
1842.61 1849.44 1842.41 1848.36 2.31284e9 1848.36
That's good. What I had in mind is this situation; once the time series models rely on time arrays (as their input or output), the user may just have a float64 array, i.e. the values
field of the time array. Then he would have to create a time array where the timestamp would be nothing
or populated by sth as default, so as to then pass this time array to the model. Anyway, this thinking can be omitted for later, no reason to worry about all this from now, let's get first the models coded and we can discuss about such mattters later :)
@AndreyKolev I've looked into the best way to merge two repositories like these and it's actually not a simple matter. Given that the original TimeModels doesn't have a deep history and the history for GARCH is relatively short, I think the best approach is as follows (but let's discuss this before someone pulls the trigger)
.julia
directory)git pull
)$ cd TimeModels
$ git remote add GARCH ../GARCH
$ git remote update
$ git checkout -b garch
$ git merge GARCH/master
Now you have a branch called garch
in your TimeModels repository. Once this is done, whoever does this should do a PR and we can just merge it. The current TimeModels has very little code in it and nothing should get squashed that would require pesky MERGING housekeeping.
@scidom I've been creating dummy timestamp data doing this.
[date(1,1,1):years(1):date(variable_that_has_length,1,1)]
So if you want 100 observations with dummy timestamps, you'd do:
[date(1,1,1):years(1):date(100,1,1)]
@milktrader OK I can try it
@AndreyKolev I just noticed we already have a branch named garch
. It's probably best to delete it first. You should do it if you're going to do the merge. You can do it from the github.com interface or from command line git branch -D garch
should do it.
Just a quick question so that I Know where we are standing with ARIMA. @ElOceanografo is ARIMA coding for now in your realm of activity? This is perfectly fine, just want to know so that we hold off from any coding in case you are/will be working on it.
In theory yes...though I'm pretty busy with other work right now. If you're ready to dive in, go ahead; I'm not sure how much time I'll have for it in the next ~week.
The idea for the state space ARIMA-fitting procedure (in case it's not clear from my code) is:
-specify order of ARIMA model: p, d, q -make "build" function that turns a vector of length p + q (i.e., the concatenated AR and MA coefficients) into a StateSpaceModel -make an "objective" function that Kalman-filters the data with that SSM and returns the -log likelihood -optimize the objective function over the values of the AR and MA coefficients
I think we should probably have classic methods (Levinson-Durbin, OLS) available too. I don't have any immediate plans to work on those, so they're up for grabs if anyone wants to call them...
Sam
On Wed, Feb 12, 2014 at 12:59 PM, Theodore Papamarkou < notifications@github.com> wrote:
Just a quick question so that I Know where we are standing with ARIMA. @ElOceanografo https://github.com/ElOceanografo is ARIMA coding for now in your realm of activity? This is perfectly fine, just want to know so that we hold off from any coding in case you are/will be working on it.
Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/TimeModels.jl/issues/15#issuecomment-34896363 .
A week is more than fine - I don't think I will have time within the next 2-3 weeks. If let's say after a month none of us has made any progress, then I will try to make time to lay my hands on it.
Alright, sounds good.
On Wed, Feb 12, 2014 at 2:26 PM, Theodore Papamarkou < notifications@github.com> wrote:
A week is more than fine - I don't think I will have time within the next 2-3 weeks. If let's say after a month none of us has made any progress, then I will try to make time to lay my hands on it.
Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/TimeModels.jl/issues/15#issuecomment-34906159 .
Hello, why different languages returns different coefficients of an AR data? The data being the same. I have checked with R, Python and MATLAB, and coefficients are different in each languages. What is reason behind this? Thanks.
What data are you using? And are the differences substantial?
Hi @milktrader. I am sorry I haven't had much time to get more actively involved in
TimeModels
. I wanted to ask you what would be the best way to create a model hierarchy in order to implement ARIMA. This won't be useful only to those who work with time series, it will be very beneficial for MCMC too given that several summary statistics for MCMC rely on AR(p) models.I would be happy to try implement ARIMA or to help with it. Do you have any recommendations from which language we could use some code as prototype for ARIMA or maybe at least AR?