JuliaStats / TimeSeries.jl

Time series toolkit for Julia
Other
353 stars 69 forks source link

Collapse should work with user-defined periods. #296

Open dourouc05 opened 7 years ago

dourouc05 commented 7 years ago

In my code, I'm mostly using Dates.Period objects to deal with time (for example, Day(1), Week(1)), because it makes my life easier for many parts of my code (mostly with date arithmetics)… everywhere except when using collapse with TimeSeries.jl. Currently, it takes a function argument, and I have to manually translate between periods and functions (associate Day(1) to Dates.day(), Week(1) to Dates.week()). It would be very interesting for me to directly pass the periods as arguments to collapse. For example:

collapse(ts, values -> sum(values), Day(1))

For TimeSeries.jl, this would be an interesting feature, as you might collapse for any kind of time period (output one value every two days, for example, instead of being limited to each day).

If there is interest for this, I can propose an implementation. (Probably not as fast as the current implementation, though.)

femtotrader commented 7 years ago

You might be interested by this API for resampling timeseries (TimeArray from TimeSeries.jl)

https://github.com/femtotrader/TimeSeriesResampler.jl/issues/3

dourouc05 commented 7 years ago

Yes, it seems this API would fit my needs! (But I would need that package to be published…)

Shouldn't this be added in the docs and as a warning to TimeSeries? (For the warning, I mean implementing collapse for Dates.Period when TimeSeriesResampler is not loaded just to show a warning telling the user to install that package.)

femtotrader commented 7 years ago

TimeSeriesResampler now supports Period (TimePeriod or DatePeriod).

See https://github.com/femtotrader/TimeSeriesResampler.jl

but I can't publish on METADATA TimeSeriesResampler.jl until TimeFrames.jl isn't also published.

https://github.com/femtotrader/TimeFrames.jl/issues/7

femtotrader commented 7 years ago

TimeFrames.jl is now published thanks to https://github.com/JuliaLang/METADATA.jl/pull/6997 Waiting TimeSeriesResampler.jl to be published https://github.com/JuliaLang/METADATA.jl/pull/7003

dourouc05 commented 7 years ago

Great, thanks! What about updating TimeSeries' documentation?

(Just as a side note: I cannot directly use your packages, as they are packaged only for Julia 0.5, and migrating will take me a bit of time, as I have a few compatibility problems…)

femtotrader commented 7 years ago

TimeSeriesResampler.jl is now registered.

milktrader commented 7 years ago

@dourouc05 can you provide a reproducible example (preferably using MarketData), to demonstrate your objective. It seems to me you just want a change in the API but I might not be reading it correctly.

milktrader commented 7 years ago

I think I get the gist of your point @dourouc05 and I've changed the title of the issue. Please comment on the change.

dourouc05 commented 7 years ago

@milktrader: yes, that is exactly what I was trying to do (with a more generic API for this function).

milktrader commented 7 years ago

I don't know offhand but how does Base.Dates handle irregular intervals (I know it does).

The second argument in the collapse method should be able to take this as a function.

milktrader commented 7 years ago

Here is an example of one could get to 2-day intervals

julia> first(cl.timestamp):Day(2):last(cl.timestamp);

julia> cl[collect(ans)]
253x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 2000-01-03 to 2001-12-31

             Close     
2000-01-03 | 111.94    
2000-01-05 | 104.0     
2000-01-07 | 99.5      
2000-01-11 | 92.75     
⋮
2001-12-19 | 21.62     
2001-12-21 | 21.0      
2001-12-27 | 22.07     
2001-12-31 | 21.9      
milktrader commented 7 years ago

@quinnj what type would you dispatch on given that Day(2) is a Base.Dates.Day?

Once we figure out that piece of the puzzle, we simply write a method that takes that value and underneath creates a range and collects it.

milktrader commented 7 years ago
julia> bar = first(cl.timestamp):Day(2):last(cl.timestamp)
2000-01-03:2 days:2001-12-31

julia> isa(bar, StepRange)
true
milktrader commented 7 years ago

@dourouc05 when you want a user-defined range, let's say 2-day interval, does that mean two calendar days or the last two observations of a TimeArray?

femtotrader commented 7 years ago

An other interesting question is should we consider begin of day or end of day ?

This is probably not evident for days but for months it's much clearer.

If you resample a time series, do you want timestamp at end of month or at first day of month ?

That's the reason why using Period (TimePeriod like Hour, Minute... or DatePeriod like Day, Month...) is not enough and the concept of TimeFrame is probably better for this purpose.

period = Month(1)

(using TimeSeriesResampler, Period are converted to TimeFrame when calling resample function)

But here, you can define lambda function such as

(dt) -> floor(dt, period)  # begin of month

or

(dt) -> ceil(dt, period) - Day(1)  # end of month (with Date)

or

(dt) -> ceil(dt, period) - Millisecond(1)  # end of month (with DateTime)
dourouc05 commented 7 years ago

@milktrader maybe this is clearer. First I have a hourly time series.

h1 h2 ... h24 ... h47 h48 h49 ... h72 ... h95 h96 ...

Then, for Day(2), I take the first 48 hours to make the first day (starting at h1, even when it is in the middle of the day, as I currently do):

sum(h1 ... h48) sum(h49 ... h96) ...

But this is my use case (for example, I know that my time series start at midnight), and others might need more flexibility (as proposed by @femtotrader).

milktrader commented 7 years ago

So you want to compress 48 observations into one.

function compress{T,N,D}(ta::TimeArray{T,N,D}, interval::Int , timestamp::Function, value::Function=timestamp)

Just to clarify, this is the current API with interval::Int replacing period::Function.

The timestamp function is typically last but you can obviously use first as well.

milktrader commented 7 years ago

NOTE: previous posted code has been debugged but only 1-dimensional arrays supported.

Try this out and tell me what you think

function compress{T,N,D}(ta::TimeArray{T,N,D}, interval::Int, position::Function=last, value::Function=position)

    len = length(ta) - (length(ta) % interval)

    t = D[]
    for ts in 1:interval:len
        push!(t, position(ta.timestamp[ts:ts+interval-1]))
    end

    v = T[]
    for vs in 1:interval:len 
         push!(v, value(ta.values[vs:vs+interval-1]))
    end

    TimeArray(t, v, ta.colnames, ta.meta)
end