JuliaStats / TimeSeries.jl

Time series toolkit for Julia
Other
352 stars 69 forks source link

Time series filtering #456

Open imbrem opened 4 years ago

imbrem commented 4 years ago

I realized that, while there's an efficient way to map timeseries, I can't find an efficient way to filter them. I think adding a filter function would be pretty helpful, and of course I'd be willing to make a PR myself.

Another feature I'd like to see would be, just like we can merge time series, to have a merge-like operation which takes in a time series and a (sorted) list of dates (or perhaps add a keyword argument to merge for this and make merge with just one item be the identity) and returns only the elements of that time series which have a date in the sorted list of dates. I'd also be willing to write a PR for this one, but would like some advice as to whether I should make a separate function or add a keyword argument to merge (in this case, we'd should also support passing in multiple sorted date vectors... perhaps we could also simply overload merge to accept sorted date vectors as well as TimeArrays...)

iblislin commented 4 years ago

Another feature I'd like to see would be, just like we can merge time series, to have a merge-like operation which takes in a time series and a (sorted) list of dates (or perhaps add a keyword argument to merge for this and make merge with just one item be the identity) and returns only the elements of that time series which have a date in the sorted list of dates. I'd also be willing to write a PR for this one,

I guess getindex is the one you want.

julia> cl[[Date(2001, 5, 1), Date(2001, 10, 1)]]
2×1 TimeArray{Float64,1,Date,Array{Float64,1}} 2001-05-01 to 2001-10-01
│            │ Close │
├────────────┼───────┤
│ 2001-05-01 │ 25.93 │
│ 2001-10-01 │ 15.54 │
imbrem commented 4 years ago

Well that was dumb haha! Still a pretty new user so I didn't see it. Thanks! What about the other ones?

EDIT: Also does getindex just skip missing indices (like merge) and does it assume the indices are sorted?

iblislin commented 4 years ago

What about the other ones?

If you mean filter!, let's discuss it in #436 .

Also does getindex just skip missing indices (like merge) and does it assume the indices are sorted?

It skips missing indices, and if the indices is not sorted, getindex will sort it first.

And... I just found another issue while testing it. Since we starting to move toward accepting dup time index, cl[[Date(2001, 10, 1), Date(2001, 5, 1), Date(2011, 5, 1)]] should output duplicated time index. I'm going to file another issue ticket.

findmyway commented 3 years ago
julia> cl[[Date(2001, 5, 1), Date(2001, 10, 1)]]
2×1 TimeArray{Float64,1,Date,Array{Float64,1}} 2001-05-01 to 2001-10-01
│            │ Close │
├────────────┼───────┤
│ 2001-05-01 │ 25.93 │
│ 2001-10-01 │ 15.54 │

One thing I found not very intuitive is the display 2001-05-01 to 2001-10-01 is kind of misleading sometimes when the TimeArray is large and not consecutive.

Another one is the findall, see this:

julia> findall(>(100), ohlc[:Close])
ERROR: MethodError: no method matching keys(::TimeArray{Float64, 1, Date, Vector{Float64}})

Stacktrace:
 [1] pairs(collection::TimeArray{Float64, 1, Date, Vector{Float64}})
   @ Base ./abstractdict.jl:138
 [2] findall(testf::Base.Fix2{typeof(>), Int64}, A::TimeArray{Float64, 1, Date, Vector{Float64}})
   @ Base ./array.jl:2153
 [3] top-level scope
   @ REPL[30]:1

Not sure if this feature is needed here.

iblislin commented 3 years ago

One thing I found not very intuitive is the display 2001-05-01 to 2001-10-01 is kind of misleading sometimes when the TimeArray is large and not consecutive.

hmm, but it shows the number of row and cols, also. And I don't have any idea how to improve the printing at this moment.


well, I think most of cases of findall can be finished by broadcasting.

findall(ohlc[:Close] .> 100)

Maybe I can make the signature findall(f, ::TimeArray) just converted to broadcasting. Are there any cases that cannot be satisfied by broadcasting?

findmyway commented 3 years ago

well, I think most of cases of findall can be finished by broadcasting. findall(ohlc[:Close] .> 100)

Yeah, but it will create a temporary bool vector here I think.

iblislin commented 3 years ago

Yeah, but it will create a temporary bool vector here I think.

ah, great point. I can make a PR about the finall(f, ::TimeArray).