JuliaStats / TimeModels.jl

Modeling time series in Julia
Other
58 stars 28 forks source link

Decide on data structure #19

Closed milktrader closed 10 years ago

milktrader commented 10 years ago

The three viable options are:

I can make the case for the TimeSeries TimeArray structure, but I don't want to imply that I'm not comfortable with other structures.

The defining metric is probably speed, and after that easy-to-reason-about semantics.

papamarkou commented 10 years ago

I have dealt with a similar situation before, while working on the MCMC package. There are a few criteria to consider when it comes down to choosing the main data structure. Firstly, we shouldn't use dataframes or dataarrays given that at least at the beginning we are not going to support NA. From that point of view, a timearray seems good to me since it doesn't deal with NA either.

In the MCMC package the main data structure is called MCMCChain. It is a composite type which has several fields, some of which are not always put in use. The result was that I received several requests from users asking whereas it would be possible to provide some of the methods in the package in a simpler form, so that they can provide merely a vector as an input to these methods. I made this change, and then created function wrappers which take MCMCChain as input and call the lower level functions which operate on vectors. We should follow the same tactic here to avoid the overhead of having to define a timestamp when it is not needed and most importantly to make life easier for the users. Vectors and timearrays will coexist and will both be available, the latter being defined on top of the former. For example, myfunction(x::TimeArray)=myfunction(x.values) will do the job, given that we first define myfunction(x::Vector{Float64}). Experience has shown that this makes the code cleaner and usage easier.

milktrader commented 10 years ago

None of the methods so far have been implemented for anything other than vectors (presumably Float64) and common time modeling functions make a point to remove NAs so I'll close this issue for now. The TimeSeries TimeArray structure has a values element that is easy enough to pass to the existing methods in this package (e.g. foo.values).