JuliaStats / TimeModels.jl

Modeling time series in Julia
Other
57 stars 28 forks source link

AR(IMA) implementation #15

Open papamarkou opened 10 years ago

papamarkou commented 10 years ago

Hi @milktrader. I am sorry I haven't had much time to get more actively involved in TimeModels. I wanted to ask you what would be the best way to create a model hierarchy in order to implement ARIMA. This won't be useful only to those who work with time series, it will be very beneficial for MCMC too given that several summary statistics for MCMC rely on AR(p) models.

I would be happy to try implement ARIMA or to help with it. Do you have any recommendations from which language we could use some code as prototype for ARIMA or maybe at least AR?

ElOceanografo commented 10 years ago

I've actually been poking at this stuff over the past couple of days, for the first time in a long while...

R's arima() function is implemented as a state-space model, which allows it to deal with series with missing data. That was part of my motivation for doing the Kalman filter last year. The ar() function in R is implemented differently, and gives you a choice of a few fitting algorithms. I've got some skeleton code for constructing and fitting ARIMA models as state-space models via maximum likelihood, which I can push tonight after work.

Definitely worth discussing the big-picture approach to take. I'm used to R's time-series modeling, so that's my point of reference.

Sam

On Mon, Feb 10, 2014 at 12:38 PM, Theodore Papamarkou < notifications@github.com> wrote:

Hi @milktrader https://github.com/milktrader. I am sorry I haven't had much time to get more actively involved in TimeModels. I wanted to ask you what would be the best way to create a model hierarchy in order to implement ARIMA. This won't be useful only to those who work with time series, it will be very beneficial for MCMC too given that several summary statistics for MCMC rely on AR(p) models.

I would be happy to try implement ARIMA or to help with it. Do you have any recommendations from which language we could use some code as prototype for ARIMA or maybe at least AR?

Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/TimeModels.jl/issues/15 .

AndreyKolev commented 10 years ago

Hi! Of course R implementation was my first thought too, but you also can look to python implementation: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/arima_model.py

Also I'm sorry I hadn't time to merge my garch code before, I think I'll have a time to do it this week finally

papamarkou commented 10 years ago

That's great @ElOceanografo, @AndreyKolev, it would be amazing if you manage to merge arima and/or garch! There are a few points for discussion, for example if we are going to deal with missing values (I imagine yes), and therefore if we are going to work with data arrays or time arrays or float64 arrays, but these are not so urgent. As long as we have some Julia code for arima or garch to work with, we can easily change these later.

milktrader commented 10 years ago

Possibly ping @carljv about model hierarchy.

Heads up on TimeSeries. It no longer supports DataFrames/DataArrays but instead has implemented the TimeArray type.

immutable TimeArray{T,N} <: AbstractTimeArray

    timestamp::Vector{Date{ISOCalendar}}
    values::Array{T,N}
    colnames::Vector{ASCIIString}

    function TimeArray(timestamp::Vector{Date{ISOCalendar}}, values::Array{T,N}, colnames::Vector{ASCIIString})
        nrow, ncol = size(values, 1), size(values, 2)
        nrow != size(timestamp, 1) ? error("values must match length of timestamp"):
        ncol != size(colnames,1) ? error("column names must match width of array"):
        timestamp != unique(timestamp) ? error("there are duplicate dates"):
        ~(flipud(timestamp) == sort(timestamp) || timestamp == sort(timestamp)) ? error("dates are mangled"):
        flipud(timestamp) == sort(timestamp) ? 
        new(flipud(timestamp), flipud(values), colnames):
        new(timestamp, values, colnames)
    end
end

The inner constructor looks like a lot of code but simply enforces obvious invariants and orders the data from youngest to oldest. Overall, the design's goals are simplicity speed. It would be interesting to take it through the paces with time models and see where the weaknesses are.

papamarkou commented 10 years ago

Thanks @milktrader. Two questions, as I look at the TimeArray definition only now for the first time:

Cheers!

milktrader commented 10 years ago

It does not yet deal with missing values. That missing piece is being explored in the DataArrays branch. The creation of a TimeArray is straightforward.

julia> mydates = [today():days(1):date(2014,3,1)]
20-element Array{Date{ISOCalendar},1}:
 2014-02-10
 2014-02-11
 2014-02-12
 2014-02-13
 2014-02-14
 2014-02-15
 ⋮
 2014-02-24
 2014-02-25
 2014-02-26
 2014-02-27
 2014-02-28
 2014-03-01

julia> myarray = [rand(20) rand(20)];

julia> mycolnames = ["r1", "r2"];

julia> mytimearray = TimeArray(mydates, myarray, mycolnames)
20x2 TimeArray{Float64,2} 2014-02-10 to 2014-03-01

             r1    r2
2014-02-10 | 0.58  0.13
2014-02-11 | 0.81  0.64
2014-02-12 | 0.18  0.07
2014-02-13 | 0.54  0.07
...
2014-02-25 | 0.42  0.99
2014-02-26 | 0.49  0.47
2014-02-27 | 0.77  0.19
2014-02-28 | 0.97  0.55
2014-03-01 | 0.45  0.72

The show method is still brittle and doesn't support long column names.

papamarkou commented 10 years ago

That's really helpful. One more question; is there any constructor that allows leaving the timestamp empty or to some default value if the user of a time series model is not so much interested in the timestamp field, but mostly or only wants to look at the values field from a modelling point of view?

milktrader commented 10 years ago

If you don't mind cloning the MarketData package you get some const objects to play with.

julia> Pkg.clone("git://github.com/JuliaQuant/MarketData.jl.git")

julia> using MarketData

julia> OHLC
16103x6 TimeArray{Float64,2} 1950-01-03 to 2013-12-31

             Open     High     Low      Close    Volume          Adj Cl
1950-01-03 | 16.66  16.66  16.66  16.66  1260000.00  16.66
1950-01-04 | 16.85  16.85  16.85  16.85  1890000.00  16.85
1950-01-05 | 16.93  16.93  16.93  16.93  2550000.00  16.93
1950-01-06 | 16.98  16.98  16.98  16.98  2010000.00  16.98
...
2013-12-24 | 1828.02  1833.32  1828.02  1833.32  1307630000.00  1833.32
2013-12-26 | 1834.96  1842.84  1834.96  1842.02  1982270000.00  1842.02
2013-12-27 | 1842.97  1844.89  1839.81  1841.40  2052920000.00  1841.40
2013-12-30 | 1841.47  1842.47  1838.77  1841.07  0.00  1841.07
2013-12-31 | 1842.61  1849.44  1842.41  1848.36  2312840000.00  1848.36

Yeah, the show method needs some work

milktrader commented 10 years ago

Dropping down to just an array is simple

julia> OHLC.values
16103x6 Array{Float64,2}:
   16.66    16.66    16.66    16.66  1.26e6       16.66
   16.85    16.85    16.85    16.85  1.89e6       16.85
   16.93    16.93    16.93    16.93  2.55e6       16.93
   16.98    16.98    16.98    16.98  2.01e6       16.98
   17.08    17.08    17.08    17.08  2.52e6       17.08
   17.03    17.03    17.03    17.03  2.16e6       17.03
    ⋮                                              ⋮
 1822.92  1829.75  1822.92  1827.99  2.85154e9  1827.99
 1828.02  1833.32  1828.02  1833.32  1.30763e9  1833.32
 1834.96  1842.84  1834.96  1842.02  1.98227e9  1842.02
 1842.97  1844.89  1839.81  1841.4   2.05292e9  1841.4
 1841.47  1842.47  1838.77  1841.07  0.0        1841.07
 1842.61  1849.44  1842.41  1848.36  2.31284e9  1848.36
papamarkou commented 10 years ago

That's good. What I had in mind is this situation; once the time series models rely on time arrays (as their input or output), the user may just have a float64 array, i.e. the values field of the time array. Then he would have to create a time array where the timestamp would be nothing or populated by sth as default, so as to then pass this time array to the model. Anyway, this thinking can be omitted for later, no reason to worry about all this from now, let's get first the models coded and we can discuss about such mattters later :)

milktrader commented 10 years ago

@AndreyKolev I've looked into the best way to merge two repositories like these and it's actually not a simple matter. Given that the original TimeModels doesn't have a deep history and the history for GARCH is relatively short, I think the best approach is as follows (but let's discuss this before someone pulls the trigger)

  1. go to the directory where both GARCH and TimeModels repositories exist (probably your .julia directory)
  2. make sure both repositories are up-to-date with master (git pull)
  3. $ cd TimeModels
  4. $ git remote add GARCH ../GARCH
  5. $ git remote update
  6. $ git checkout -b garch
  7. $ git merge GARCH/master

Now you have a branch called garch in your TimeModels repository. Once this is done, whoever does this should do a PR and we can just merge it. The current TimeModels has very little code in it and nothing should get squashed that would require pesky MERGING housekeeping.

milktrader commented 10 years ago

@scidom I've been creating dummy timestamp data doing this.

[date(1,1,1):years(1):date(variable_that_has_length,1,1)]

So if you want 100 observations with dummy timestamps, you'd do:

[date(1,1,1):years(1):date(100,1,1)]

AndreyKolev commented 10 years ago

@milktrader OK I can try it

milktrader commented 10 years ago

@AndreyKolev I just noticed we already have a branch named garch. It's probably best to delete it first. You should do it if you're going to do the merge. You can do it from the github.com interface or from command line git branch -D garch should do it.

papamarkou commented 10 years ago

Just a quick question so that I Know where we are standing with ARIMA. @ElOceanografo is ARIMA coding for now in your realm of activity? This is perfectly fine, just want to know so that we hold off from any coding in case you are/will be working on it.

ElOceanografo commented 10 years ago

In theory yes...though I'm pretty busy with other work right now. If you're ready to dive in, go ahead; I'm not sure how much time I'll have for it in the next ~week.

The idea for the state space ARIMA-fitting procedure (in case it's not clear from my code) is:

-specify order of ARIMA model: p, d, q -make "build" function that turns a vector of length p + q (i.e., the concatenated AR and MA coefficients) into a StateSpaceModel -make an "objective" function that Kalman-filters the data with that SSM and returns the -log likelihood -optimize the objective function over the values of the AR and MA coefficients

I think we should probably have classic methods (Levinson-Durbin, OLS) available too. I don't have any immediate plans to work on those, so they're up for grabs if anyone wants to call them...

Sam

On Wed, Feb 12, 2014 at 12:59 PM, Theodore Papamarkou < notifications@github.com> wrote:

Just a quick question so that I Know where we are standing with ARIMA. @ElOceanografo https://github.com/ElOceanografo is ARIMA coding for now in your realm of activity? This is perfectly fine, just want to know so that we hold off from any coding in case you are/will be working on it.

Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/TimeModels.jl/issues/15#issuecomment-34896363 .

papamarkou commented 10 years ago

A week is more than fine - I don't think I will have time within the next 2-3 weeks. If let's say after a month none of us has made any progress, then I will try to make time to lay my hands on it.

ElOceanografo commented 10 years ago

Alright, sounds good.

On Wed, Feb 12, 2014 at 2:26 PM, Theodore Papamarkou < notifications@github.com> wrote:

A week is more than fine - I don't think I will have time within the next 2-3 weeks. If let's say after a month none of us has made any progress, then I will try to make time to lay my hands on it.

Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/TimeModels.jl/issues/15#issuecomment-34906159 .

mishrasubhajit commented 9 years ago

Hello, why different languages returns different coefficients of an AR data? The data being the same. I have checked with R, Python and MATLAB, and coefficients are different in each languages. What is reason behind this? Thanks.

milktrader commented 9 years ago

What data are you using? And are the differences substantial?