JuliaStats / TimeSeries.jl

Time series toolkit for Julia
Other
353 stars 69 forks source link

Rename columns #267

Closed femtotrader closed 8 years ago

femtotrader commented 8 years ago

After doing

start = Date(2000,1,30)
periods = 9
freq = Dates.Day(1)
dates = start:start + (periods - 1) * freq
mytime3 = TimeArray(collect(dates), rand(length(dates), 3), ["col1", "col2", "col3"])

I was looking for a way to rename columns. I thought that some methods such as names! and rename! (like DataFrames.jl) would support TimeArray but it doesn't seems to be the case.

julia> names!(mytime3, ["c1", "c2", "c3"])
ERROR: MethodError: `names!` has no method matching names!(::TimeSeries.TimeArray{Float64,2,Date,Array{Float64,2}}, ::Array{ASCIIString,1})
Closest candidates are:
  names!(::DataFrames.AbstractDataFrame, ::Any)

and

julia> rename!(mytime3, Dict("col1"=>"c1"))
ERROR: MethodError: `rename!` has no method matching rename!(::TimeSeries.TimeArray{Float64,2,Date,Array{Float64,2}}, ::Dict{ASCIIString,ASCIIString})
Closest candidates are:
  rename!(::DataFrames.Index, ::Any)
  rename!(::DataFrames.Index, ::Any, ::Any)
  rename!(::DataFrames.AbstractDataFrame, ::Any...)
milktrader commented 8 years ago

The TimeArray is immutable so you can't change things. You can create a new TImeArray though. This immutability feature is by design.

dourouc05 commented 8 years ago

I think there is still something to do here: how do you easily create a copy of a time series by changing only a few fields? The only syntax I can think of is:

TimeArray(timestamp(ta), values(ta), ["col1", "col2"], meta(ta))

In Scala, a language that promotes immutability as a core concept, you would use a copy() function (automatically defined for case classes: http://www.scala-lang.org/docu/files/ScalaReference.pdf, section 5.3.2, page 68); adapted to Julia, this would give:

copy(ta, colnames=["col1", "col2"])

I think it's a much clearer and cleaner way that creating a new TimeArray from scratch using the constructor. (What if there is a new field for TimeArrays?)

The implementation could be quite simple:

copy(ta; timestamp=ta.timestamp, values=ta.values, colnames=ta.colnames, meta=ta.meta) = 
  TimeArray(timestamp, values, colnames, meta)

That does not fit with current Julia practices, from what I can see, but there is no real pattern for those immutable types AFAIK.

milktrader commented 8 years ago

Changing colnames and meta could be implemented easily as noted and perhaps name the methods colnames! and meta!? The non-bang methods are already being used to return only that respective field.

The issue to overcome with this pattern is that you cannot really generalize it to timestamp and values since those can fields must match in their length. Maybe update!? That's what it would be useful for, updating a TimeArray with a new observation as time progresses.

femtotrader commented 8 years ago

To my understanding methods with a ending exclamation point modify object "inplace" http://docs.julialang.org/en/release-0.4/stdlib/base/ "By convention, function names ending with an exclamation point (!) modify their arguments. " which is not the case here. Methods "setter" could be named colname and meta because they should have different parameters than "getter".

milktrader commented 8 years ago

Yeah, that's a good point. Let's keep ! out of this.

milktrader commented 8 years ago

Curious how Haskell and Scala treat this, must admit a bit rusty in those.

ararslan commented 8 years ago

I'm a little disappointed that "a bit rusty" was not a Rust pun.

milktrader commented 8 years ago

What about this?

TimeArrays are immutable objects so they cannot be changed in-place. They can be copied though with the update method, which creates a new TimeArray.

# insert new observation
update(ta::TimeArray, timestamp, values; colnames=ta.colnames, meta=ta.meta)

# change colnames
update(ta::TimeArray, colnames::Vector{UTF8String}) 

# change meta
update(ta::TimeArray, meta)

I think this works as advertised but haven't tried it.

milktrader commented 8 years ago

Yep, that works.

julia> using TimeSeries, MarketData

julia> function update(ta::TimeArray, timestamp, values; colnames=ta.colnames, meta=ta.meta)
       TimeArray(timestamp, values, ta.colnames, ta.meta)
       end
update (generic function with 3 methods)

julia> function update(ta::TimeArray, colnames::Vector{UTF8String})
       TimeArray(ta.timestamp, ta.values, colnames, ta.meta)
       end
update (generic function with 3 methods)

julia> function update(ta::TimeArray, meta)
       TimeArray(ta.timestamp, ta.values, ta.colnames, meta)
       end
update (generic function with 3 methods)

julia> new_timestamp = cl.timestamp[2:end];

julia> new_values = cl.values[2:end];

julia> add_time_and_values = update(cl, new_timestamp, new_values)
499x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 2000-01-04 to 2001-12-31

             Close     
2000-01-04 | 102.5     
2000-01-05 | 104.0     
2000-01-06 | 95.0      
2000-01-07 | 99.5      
⋮
2001-12-26 | 21.49     
2001-12-27 | 22.07     
2001-12-28 | 22.43     
2001-12-31 | 21.9      

julia> change_colnames = update(cl, UTF8String["New Close"])
500x1 TimeSeries.TimeArray{Float64,1,Date,Array{Float64,1}} 2000-01-03 to 2001-12-31

             New Close  
2000-01-03 | 111.94     
2000-01-04 | 102.5      
2000-01-05 | 104.0      
2000-01-06 | 95.0       
⋮
2001-12-26 | 21.49      
2001-12-27 | 22.07      
2001-12-28 | 22.43      
2001-12-31 | 21.9       

julia> change_meta = update(cl, "Stale Apple stock prices");

julia> change_meta.meta
"Stale Apple stock prices"
dourouc05 commented 8 years ago

In my understanding, you could also provide macros to "modify in place" the objects, making them write code like:

ta = update(ta, …)

Thus, for example, a @meta!(ta, nm) would generate this code:

ta = update(ta, nm)

What is the Julia point of view on this kind of operations? I'm pretty sure this is not an issue just for TimeSeries.jl, but I could not find any kind of common wisdom about how to update an immutable data structure.

Wouldn't the "add measurements" part deserve some specific syntax? Something like push(ta::TimeArray, timestamp, value)?

milktrader commented 8 years ago

Yeah, I think push or push! are not good candidates. It has an established semantic in normal arrays (which are not immutable).

I'm not sure update is good, but I think it's the best candidate.

Another one is copy, but that gets into some dis-similar semantics to what we're aiming for here.

milktrader commented 8 years ago

This is a good reference for how Julia treats immutables.

milktrader commented 8 years ago

After giving this some thought, I think update will be the best name for our new constructor.

Though TimeArray is immutable, the object is not immutably bound to a variable. The immutability is more of a safeguard or speed bump to avoid unwanted changes. I'll work out the details to implement this.

milktrader commented 8 years ago

After coding this out, I think update should only add a new timestamp/value pair and to change column names, the method should be called rename.

milktrader commented 8 years ago

0.8.5 implements this

femtotrader commented 8 years ago

Thanks