Open ExpandingMan opened 7 years ago
This has come up before (e.g. #250) and I think it makes sense.
With TimeAxisArrays.jl I started reproducing the functionality from this package on top of the AxisArray
datatype, which supports indexing with arbitrary time units (it also comes with nice interval selections out of the box) - it's almost at feature-parity (see https://github.com/GordStephen/TimeAxisArrays.jl/issues/1). Unfortunately it's been a while (pre-0.5) since I've been able to work on it, so I don't know what kind of shape it's it for modern use...
I'm with @GordStephen on this. I think this functionality would be better suited for a completely different type than TimeArray
.
If you want to experiment, you could change the timestamp
field in the TimeArray type from TimeType
to Union(TimeType, Float64)
.
immutable TimeArray{T, N, D<:TimeType, A<:AbstractArray} <: AbstractTimeSeries
It wouldn't be trivial to add all the new methods dispatched on Float64
instead of TimeType
but it also could be done without altering any existing methods.
You might also pick and choose which methods to define.
It seems perfectly reasonable to make this a different type, but it seems to me that easy conversion back and forth should be imperative. Right now the issue is that most actual analysis that one would do on a time series requires the temporal axis to be treated simply as a real number, so the inability to treat the time series this way seriously curtails the usefulness of this package.
@ExpandingMan have you tried tweaking the existing type yet? I'd be interested to see what this looks like. I'd also be happy to make it a branch for this repo.
Unfortunately I haven't really had time to work on this yet. What I've been doing in the meantime is writing my package to accept pairs of Vector
s (the first being the time axis, the second being the ordinate of the time series) and then later I will wrap those with other functions that take TimeArray
and extract the appropriate vector somehows. I expect what this will look like is something that converts, for instance, DateTime
s to the number of seconds since t=0 as a float.
Ultimately, it'd be nice to do something like
convertaxis(Dates.Second, ta)
where ta
is a TimeArray
. Obviously the only labor intensive part about this should be getting the rest of the API up to parity with TimeType
.
I'll try setting something up when I have time, but right now my priority is working on the algorithms I need to help me work on forecasting.
I changed like 7 lines of code to get this behavior ...
julia> TimeArray(collect(1:10.), collect(1:10))
10x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 1.0 to 10.0
1.0 | 1
2.0 | 2
3.0 | 3
4.0 | 4
⋮
7.0 | 7
8.0 | 8
9.0 | 9
10.0 | 10
Is this what you have in mind?
Appears this small change is actually pretty robust at first pass ...
julia> foo = TimeArray(collect(1:10.), collect(1:10));
julia> bar = TimeArray(collect(1:10.), collect(101:110));
julia> head(foo)
1x1 TimeSeries.TimeArray{Int64,2,Float64,Array{Int64,2}} 1.0 to 1.0
1.0 | 1
julia> tail(bar, 3)
3x1 TimeSeries.TimeArray{Int64,2,Float64,Array{Int64,2}} 8.0 to 10.0
8.0 | 108
9.0 | 109
10.0 | 110
julia> lag(foo)
9x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 2.0 to 10.0
2.0 | 1
3.0 | 2
4.0 | 3
5.0 | 4
⋮
7.0 | 6
8.0 | 7
9.0 | 8
10.0 | 9
julia> lead(bar)
9x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 1.0 to 9.0
1.0 | 102
2.0 | 103
3.0 | 104
4.0 | 105
⋮
6.0 | 107
7.0 | 108
8.0 | 109
9.0 | 110
julia> foo .+ bar
10x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 1.0 to 10.0
.+
1.0 | 102
2.0 | 104
3.0 | 106
4.0 | 108
⋮
7.0 | 114
8.0 | 116
9.0 | 118
10.0 | 120
julia> percentchange(foo)
9x1 TimeSeries.TimeArray{Float64,1,Float64,Array{Float64,1}} 2.0 to 10.0
2.0 | 1.0
3.0 | 0.5
4.0 | 0.3333
5.0 | 0.25
⋮
7.0 | 0.1667
8.0 | 0.1429
9.0 | 0.125
10.0 | 0.1111
julia> merge(foo, bar)
10x2 TimeSeries.TimeArray{Int64,2,Float64,Array{Int64,2}} 1.0 to 10.0
_1 _2
1.0 | 1 101
2.0 | 2 102
3.0 | 3 103
4.0 | 4 104
⋮
7.0 | 7 107
8.0 | 8 108
9.0 | 9 109
10.0 | 10 110
julia> from(foo, 6.0)
5x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 6.0 to 10.0
6.0 | 6
7.0 | 7
8.0 | 8
9.0 | 9
10.0 | 10
Yes, but I think it needs to accommodate a couple of things. For one, there should be some way of easily converting between DateTime
and floats (like the example I gave, something like convertaxis(Dates.Second, ta)
to convert to floats, or maybe convertaxis(Dates.DateTime, DateTime(3147, 3, 27), ta)
to convert to DateTime
).
Again, the more difficult part will be making sure that all the existing functionality in the package also applies to floating point time (where applicable). EDIT looks like you already did some of this!
Thanks for looking into this!
Perhaps I'm misunderstanding what you mean, but if actual convert
methods should be defined, e.g. Float64(::DateTime)
becomes meaningful, that should happen in Base. Packages shouldn't make those kinds of extensions of Base functions/constructors using Base types.
I just pushed the float_time
branch. You should be able to checkout the branch and give it a try.
Sorry, I should have been clearer what I meant about "converting". There was a reason I was suggesting using a different symbol like convertaxis
instead of convert.
Of course you should not be able to do convert(Float64, DateTime(3147, 3, 27))
. However, what you should be able to do is, given a time that serves as t=0, compute the amount of time (in floating point, in whatever units) which has elapsed between some DateTime
and t=0. This should be easy to do, so that the axis can be easily taken from something with DateTime
values to something with the appropriate float values.
As a somewhat silly example, I should be able to easily take [Date(3147, 3, 27), Date(3147, 3, 28), Date(3147, 3, 29)]
, specify that I want units of days such that t=0 corresponds to the first date and get [0.0, 1.0, 2.0]
.
Does that make sense? I'm not saying it's something that'd be difficult for us to implement, or even something that it'd be difficult for users to do themselves (the needed functions already exist in Dates
, so it's actually very easy to do!), it just seems to me that this functionality would be needed so often in actual analysis that it should be built in.
Maybe this is actually so trivial that it's a silly request, but again, it's more about how commonly used I'd think this would be (for analysis purposes) than it is about how hard it is for users to duplicate themselves.
Well, this is nowhere near as elegant as I was hoping for, but see this gist.
Now, if you do something like
ta = TimeArray([Date(3147, 7, 1) + Dates.Day(i) for i in 0:2], rand(3))
t, x = convertaxis(Dates.Hour, ta)
for t
you will get [0.0, 24.0, 48.0]
. You can pass other Period
s to get different conversions.
You'll probably ask why I had to bother with convert_ms
and convert_day
. Well, unfortunately if you do something like convert(Dates.Second, Dates.Day(1))
you will get an InexactError
. This is because the Period
's all store integers, not floats. Defining convert_ms
and convert_day
, while unfortunate, is the least elaborate way I could see to achieve the desired result.
Anyway, something like this is what I thought it would be important to have for going back and forth between time series with Date
or DateTime
axes and time series with real axes (of course I didn't do the inverse conversion yet).
Let me know if you think this would be a good functionality to include in the float_time
branch, and I'll make a PR at some point (of course I'll change this to output TimeArray
).
By all means make a PR to the float_time branch. It would be easier to see the code to make comments. Github also has a feature where comments can be made on each code line which I'm quite fond of.
Since we're talking about an experimental branch, don't be concerned with elegance and efficiency right way, those lumps can always be smoothed out. Just plop something in there.
Just to let you know, I haven't really been working on this. The current method I'm using is pretty hackish, you can see it here. There are a number of annoyances when trying to do this. One is that the Julia Period
types only store integers, so it doesn't make sense to do something like, for instance convert(Minute, Second(35))
. What I've done is working fine, but I don't feel like it's really making proper use of the Julia Dates
package, and it isn't as general as I feel it would need to be to be appropriate for TimeSeries.jl. It might be worth waiting for v0.6 in which there will be a Dates.Time
type (for storing times of day).
Anyway, I'm not seriously planning to do a PR soon, as I'd need to spend a lot more time thinking about Period
s. Frankly, it would probably be warranted to make some changes to the Dates
package itself, I really think it would be best if Period
types worked like, for instance Day{T<:Number}
, but that would require a lot of work.
If you see no other problem with the float_time branch otherwise, you may want to consider merging it into master without worrying about conversions just yet.
By the way, humans have really absurd ways of measuring time!
Agreed on how human's view time!
I'm not planning to merge the float_time branch anytime soon. There is benchmarking, documentation, and a boatload of tests that would be required. I think this is an edge case, but we can certainly keep the branch open.
Also, I'll keep this issue open to funnel discussion to this topic.
Ok, sounds good.
I just wanted to re-iterate my point about whether this is an edge case though: I really, really think that it's not. In fact, the only situation I can think of in which you wouldn't need to do this is when all the datapoints are equally spaced in time (a common use-case yes, but hardly general). Any time that is not the case you really need to know when they are occurring. It certainly has come up a lot in the work I've done on segmentation and regression, none of which is exotic by any means. (Though, perhaps the most compelling reason of all is just how horrible date-time is!)
Wondering how to close this and think maybe we can add in the documentation that this branch exists. Also, who has the git-fu for continuing to update the branch with master changes?
git checkout float_time
git fetch origin
git rebase origin/master
# Fix any merge conflicts manually
git push origin/float_time
Maybe it is due to my background in physics, but I would expect a very common use case for time series would involve "timestamps" that are simply single or double precision floats. After all, time is just a real parameter. Often objects like
DateTime
are a necessary evil, and they are so (comparatively) difficult to work with that it makes sense to consider them as the main functionality, but sometimes this isn't the case and one can do fine with real numbers. Are there any plans to support this?