JuliaStats / TimeSeries.jl

Time series toolkit for Julia
Other
353 stars 69 forks source link

Convert DataFrame to TimeArray #290

Closed femtotrader closed 7 years ago

femtotrader commented 7 years ago

Hello,

I haven't find in doc an easy way to convert a DataFrame to TimeArray

using DataFrames
dat = "Date,Stock,Open,High,Low,Close,Volume
       2016-09-29,KESM,7.92,7.98,7.92,7.97,149400
       2016-09-30,KESM,7.96,7.97,7.84,7.9,29900
       2016-10-04,KESM,7.8,7.94,7.8,7.93,99900
       2016-10-05,KESM,7.93,7.95,7.89,7.93,77500
       2016-10-06,KESM,7.93,7.93,7.89,7.92,130600
       2016-10-07,KESM,7.91,7.94,7.91,7.92,103000"

io = IOBuffer(dat)

df = readtable(io)
df[:Date] = Date(df[:Date])

println(df)

using TimeSeries

function to_timearray(df::DataFrame; timestamp=:Date, colnames=Symbol[])
    if length(colnames) == 0
        colnames = names(df)
        colnames = filter(s->s!=timestamp, colnames)
    end
    colnames_str = [string(s) for s in colnames]
    a_timestamp = Array(df[timestamp])
    a_values = Array(df[colnames])
    ta = TimeArray(a_timestamp, a_values, colnames_str)
    ta
end

ta = to_timearray(df, colnames=[:Open, :High, :Low, :Close])
println(ta)

ta = to_timearray(df[[:Date, :Open, :High, :Low, :Close]])
println(ta)

Maybe such function should be part of TimeSeries (without adding DataFrames as a dependency) or should be given as a "recipe" in doc.

milktrader commented 7 years ago

You can drop the to and just call the method timearray, which would be more idiomatic Julia.

And yes, it would require DataFrames as a dependency so this method belongs in a separate package.

nalimilan commented 7 years ago

You can drop the to and just call the method timearray, which would be more idiomatic Julia.

Actually, the idiomatic Julia version would be to add this method to the TimeArray constructors, and possibly to convert.

milktrader commented 7 years ago

I've tried this before and it would require DataFrames to be a dependency. But otherwise yes, TimeArray not timearray

femtotrader commented 7 years ago

It may be possible to add this feature without adding DataFrames as a dependency.

Requires.jl https://github.com/MikeInnes/Requires.jl could be a solution, isn't it ?

milktrader commented 7 years ago

This will bloat the code. I'm against adding any dependencies.

I think a good project though would be to start a new package that does many of the things you've asked for in TimeSeries. Call it TimeSeriesReader maybe?

If you want to start the package I'd be happy to contribute. You don't need to stop at DataFrames. Why not include other sources including various databases? It might be a good place to implement streaming time series data.

femtotrader commented 7 years ago

Repository TimeSeriesReader.jl created.

Edit: renamed https://github.com/femtotrader/TimeSeriesIO.jl (as it can import to TimeSeries but also export from TimeSeries)

milktrader commented 7 years ago

Watching it ... 👍

femtotrader commented 7 years ago

Feel free to contribute... For example I haven't been able to name function TimeArray. I don't understand why... that's the "reason" of this to_TimeArray name

milktrader commented 7 years ago

https://github.com/femtotrader/TimeSeriesReader.jl/issues/2

nalimilan commented 7 years ago

A good solution would be to support reading data from DataStreams, so that you don't need to reinvent the wheel for every possible data source.

milktrader commented 7 years ago

https://github.com/JuliaStats/TimeSeries.jl/issues/292

femtotrader commented 7 years ago

Converting TimeArray to DataFrame can be done using IterableTables.jl from @davidanthoff

After https://github.com/davidanthoff/IterableTables.jl/issues/46 being fixed we should be able to convert more easily DataFrame to TimeArray.

femtotrader commented 7 years ago

That's fixed!