JuliaStats / TimeSeries.jl

Time series toolkit for Julia
Other
354 stars 69 forks source link

support for real-valued time #300

Open ExpandingMan opened 7 years ago

ExpandingMan commented 7 years ago

Maybe it is due to my background in physics, but I would expect a very common use case for time series would involve "timestamps" that are simply single or double precision floats. After all, time is just a real parameter. Often objects like DateTime are a necessary evil, and they are so (comparatively) difficult to work with that it makes sense to consider them as the main functionality, but sometimes this isn't the case and one can do fine with real numbers. Are there any plans to support this?

GordStephen commented 7 years ago

This has come up before (e.g. #250) and I think it makes sense.

With TimeAxisArrays.jl I started reproducing the functionality from this package on top of the AxisArray datatype, which supports indexing with arbitrary time units (it also comes with nice interval selections out of the box) - it's almost at feature-parity (see https://github.com/GordStephen/TimeAxisArrays.jl/issues/1). Unfortunately it's been a while (pre-0.5) since I've been able to work on it, so I don't know what kind of shape it's it for modern use...

milktrader commented 7 years ago

I'm with @GordStephen on this. I think this functionality would be better suited for a completely different type than TimeArray.

If you want to experiment, you could change the timestamp field in the TimeArray type from TimeType to Union(TimeType, Float64).

immutable TimeArray{T, N, D<:TimeType, A<:AbstractArray} <: AbstractTimeSeries

It wouldn't be trivial to add all the new methods dispatched on Float64 instead of TimeType but it also could be done without altering any existing methods.

You might also pick and choose which methods to define.

ExpandingMan commented 7 years ago

It seems perfectly reasonable to make this a different type, but it seems to me that easy conversion back and forth should be imperative. Right now the issue is that most actual analysis that one would do on a time series requires the temporal axis to be treated simply as a real number, so the inability to treat the time series this way seriously curtails the usefulness of this package.

milktrader commented 7 years ago

@ExpandingMan have you tried tweaking the existing type yet? I'd be interested to see what this looks like. I'd also be happy to make it a branch for this repo.

ExpandingMan commented 7 years ago

Unfortunately I haven't really had time to work on this yet. What I've been doing in the meantime is writing my package to accept pairs of Vectors (the first being the time axis, the second being the ordinate of the time series) and then later I will wrap those with other functions that take TimeArray and extract the appropriate vector somehows. I expect what this will look like is something that converts, for instance, DateTimes to the number of seconds since t=0 as a float.

Ultimately, it'd be nice to do something like

convertaxis(Dates.Second, ta)

where ta is a TimeArray. Obviously the only labor intensive part about this should be getting the rest of the API up to parity with TimeType.

I'll try setting something up when I have time, but right now my priority is working on the algorithms I need to help me work on forecasting.

milktrader commented 7 years ago

I changed like 7 lines of code to get this behavior ...

julia> TimeArray(collect(1:10.), collect(1:10))
10x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 1.0 to 10.0

1.0 | 1        
2.0 | 2        
3.0 | 3        
4.0 | 4        
⋮
7.0 | 7        
8.0 | 8        
9.0 | 9        
10.0 | 10     

Is this what you have in mind?

milktrader commented 7 years ago

Appears this small change is actually pretty robust at first pass ...

julia> foo = TimeArray(collect(1:10.), collect(1:10));

julia> bar = TimeArray(collect(1:10.), collect(101:110));

julia> head(foo)
1x1 TimeSeries.TimeArray{Int64,2,Float64,Array{Int64,2}} 1.0 to 1.0

1.0 | 1

julia> tail(bar, 3)
3x1 TimeSeries.TimeArray{Int64,2,Float64,Array{Int64,2}} 8.0 to 10.0

8.0 | 108       
9.0 | 109       
10.0 | 110

julia> lag(foo)
9x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 2.0 to 10.0

2.0 | 1       
3.0 | 2       
4.0 | 3       
5.0 | 4       
⋮
7.0 | 6       
8.0 | 7       
9.0 | 8       
10.0 | 9

julia> lead(bar)
9x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 1.0 to 9.0

1.0 | 102       
2.0 | 103       
3.0 | 104       
4.0 | 105       
⋮
6.0 | 107       
7.0 | 108       
8.0 | 109       
9.0 | 110

julia> foo .+ bar
10x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 1.0 to 10.0

      .+        
1.0 | 102       
2.0 | 104       
3.0 | 106       
4.0 | 108       
⋮
7.0 | 114       
8.0 | 116       
9.0 | 118       
10.0 | 120

julia> percentchange(foo)
9x1 TimeSeries.TimeArray{Float64,1,Float64,Array{Float64,1}} 2.0 to 10.0

2.0 | 1.0     
3.0 | 0.5     
4.0 | 0.3333  
5.0 | 0.25    
⋮
7.0 | 0.1667  
8.0 | 0.1429  
9.0 | 0.125   
10.0 | 0.1111 

julia> merge(foo, bar)
10x2 TimeSeries.TimeArray{Int64,2,Float64,Array{Int64,2}} 1.0 to 10.0

      _1       _2        
1.0 | 1        101       
2.0 | 2        102       
3.0 | 3        103       
4.0 | 4        104       
⋮
7.0 | 7        107       
8.0 | 8        108       
9.0 | 9        109       
10.0 | 10       110    

julia> from(foo, 6.0)
5x1 TimeSeries.TimeArray{Int64,1,Float64,Array{Int64,1}} 6.0 to 10.0

6.0 | 6        
7.0 | 7        
8.0 | 8        
9.0 | 9        
10.0 | 10          
ExpandingMan commented 7 years ago

Yes, but I think it needs to accommodate a couple of things. For one, there should be some way of easily converting between DateTime and floats (like the example I gave, something like convertaxis(Dates.Second, ta) to convert to floats, or maybe convertaxis(Dates.DateTime, DateTime(3147, 3, 27), ta) to convert to DateTime).

Again, the more difficult part will be making sure that all the existing functionality in the package also applies to floating point time (where applicable). EDIT looks like you already did some of this!

Thanks for looking into this!

ararslan commented 7 years ago

Perhaps I'm misunderstanding what you mean, but if actual convert methods should be defined, e.g. Float64(::DateTime) becomes meaningful, that should happen in Base. Packages shouldn't make those kinds of extensions of Base functions/constructors using Base types.

milktrader commented 7 years ago

I just pushed the float_time branch. You should be able to checkout the branch and give it a try.

ExpandingMan commented 7 years ago

Sorry, I should have been clearer what I meant about "converting". There was a reason I was suggesting using a different symbol like convertaxis instead of convert.

Of course you should not be able to do convert(Float64, DateTime(3147, 3, 27)). However, what you should be able to do is, given a time that serves as t=0, compute the amount of time (in floating point, in whatever units) which has elapsed between some DateTime and t=0. This should be easy to do, so that the axis can be easily taken from something with DateTime values to something with the appropriate float values.

As a somewhat silly example, I should be able to easily take [Date(3147, 3, 27), Date(3147, 3, 28), Date(3147, 3, 29)], specify that I want units of days such that t=0 corresponds to the first date and get [0.0, 1.0, 2.0].

Does that make sense? I'm not saying it's something that'd be difficult for us to implement, or even something that it'd be difficult for users to do themselves (the needed functions already exist in Dates, so it's actually very easy to do!), it just seems to me that this functionality would be needed so often in actual analysis that it should be built in.

Maybe this is actually so trivial that it's a silly request, but again, it's more about how commonly used I'd think this would be (for analysis purposes) than it is about how hard it is for users to duplicate themselves.

ExpandingMan commented 7 years ago

Well, this is nowhere near as elegant as I was hoping for, but see this gist.

Now, if you do something like

ta = TimeArray([Date(3147, 7, 1) + Dates.Day(i) for i in 0:2], rand(3))
t, x = convertaxis(Dates.Hour, ta)

for t you will get [0.0, 24.0, 48.0]. You can pass other Periods to get different conversions.

You'll probably ask why I had to bother with convert_ms and convert_day. Well, unfortunately if you do something like convert(Dates.Second, Dates.Day(1)) you will get an InexactError. This is because the Period's all store integers, not floats. Defining convert_ms and convert_day, while unfortunate, is the least elaborate way I could see to achieve the desired result.

Anyway, something like this is what I thought it would be important to have for going back and forth between time series with Date or DateTime axes and time series with real axes (of course I didn't do the inverse conversion yet).

Let me know if you think this would be a good functionality to include in the float_time branch, and I'll make a PR at some point (of course I'll change this to output TimeArray).

milktrader commented 7 years ago

By all means make a PR to the float_time branch. It would be easier to see the code to make comments. Github also has a feature where comments can be made on each code line which I'm quite fond of.

Since we're talking about an experimental branch, don't be concerned with elegance and efficiency right way, those lumps can always be smoothed out. Just plop something in there.

ExpandingMan commented 7 years ago

Just to let you know, I haven't really been working on this. The current method I'm using is pretty hackish, you can see it here. There are a number of annoyances when trying to do this. One is that the Julia Period types only store integers, so it doesn't make sense to do something like, for instance convert(Minute, Second(35)). What I've done is working fine, but I don't feel like it's really making proper use of the Julia Dates package, and it isn't as general as I feel it would need to be to be appropriate for TimeSeries.jl. It might be worth waiting for v0.6 in which there will be a Dates.Time type (for storing times of day).

Anyway, I'm not seriously planning to do a PR soon, as I'd need to spend a lot more time thinking about Periods. Frankly, it would probably be warranted to make some changes to the Dates package itself, I really think it would be best if Period types worked like, for instance Day{T<:Number}, but that would require a lot of work.

If you see no other problem with the float_time branch otherwise, you may want to consider merging it into master without worrying about conversions just yet.

By the way, humans have really absurd ways of measuring time!

milktrader commented 7 years ago

Agreed on how human's view time!

I'm not planning to merge the float_time branch anytime soon. There is benchmarking, documentation, and a boatload of tests that would be required. I think this is an edge case, but we can certainly keep the branch open.

Also, I'll keep this issue open to funnel discussion to this topic.

ExpandingMan commented 7 years ago

Ok, sounds good.

I just wanted to re-iterate my point about whether this is an edge case though: I really, really think that it's not. In fact, the only situation I can think of in which you wouldn't need to do this is when all the datapoints are equally spaced in time (a common use-case yes, but hardly general). Any time that is not the case you really need to know when they are occurring. It certainly has come up a lot in the work I've done on segmentation and regression, none of which is exotic by any means. (Though, perhaps the most compelling reason of all is just how horrible date-time is!)

milktrader commented 7 years ago

Wondering how to close this and think maybe we can add in the documentation that this branch exists. Also, who has the git-fu for continuing to update the branch with master changes?

ararslan commented 7 years ago
git checkout float_time
git fetch origin
git rebase origin/master
# Fix any merge conflicts manually
git push origin/float_time