JuliaStats / TimeSeries.jl

Time series toolkit for Julia
Other
353 stars 69 forks source link

readtimearray with header #353

Closed xgdgsc closed 6 years ago

xgdgsc commented 6 years ago

I' m reading a typical csv generated by pandas.to_csv() in python, which has headers like:

,open,close,high,low,volume,money
2017-11-02 15:00:00,63.35,63.32,63.35,63.32,97100.0,6148352.0
2017-11-03 09:31:00,63.32,63.09,63.32,63.08,53000.0,3351958.0

when read with

quotes = readtimearray("quotes.csv", format="yyyy-mm-dd HH:MM:SS")

It will return a TimeArray with colnames of:

6-element Array{String,1}:
 "63.35"     
 "63.32"     
 "63.35_1"   
 "63.32_1"   
 "97100.0"   
 "6.148352e6"

It would ignore the first row headers. Is there any way I can make it read headers?

I think people moving from the python/pandas pipeline to julia would have many small issues like this. Now I see DataFrame.jl doesn' t have pandas-style time index. Do you have any suggestions on moving from python/pandas to Julia? Which package should I use to hold the time series data (is TimeSeries.jl the most mature one for now)? Or should I use Pandas.jl to load the data and write my own calculation functions on pure Julia Arrays? Thanks.

iblislin commented 6 years ago

well, simply pad your file from ,open,close,high,low,volume,money to datetime,open,close,high,low,volume,money and it will work correctly.

iblislin commented 6 years ago

I think people moving from the python/pandas pipeline to julia would have many small issues like this. Now I see DataFrame.jl doesn' t have pandas-style time index. Do you have any suggestions on moving from python/pandas to Julia? Which package should I use to hold the time series data (is TimeSeries.jl the most mature one for now)? Or should I use Pandas.jl to load the data and write my own calculation functions on pure Julia Arrays? Thanks.

TBH, TimeSeries.jl isn't mature for me. It still need more works of integration with other ecosystem like DataStreams & TimeSeriesIO.jl (and make you easily convert data between DataFrames and TimeSeries, and gain both of functionalities, for example); and more works about ploting... etc.

Which package should I use to hold the time series data (is TimeSeries.jl the most mature one for now)?

There is a alternative: https://github.com/dysonance/Temporal.jl. It serves different style APIs from TimeSeries.jl. (But I personally want to switch to its style of indexing, I'm working on it)

Or should I use Pandas.jl to load the data and write my own calculation functions on pure Julia Arrays?

That's another feasible way, but you have to make your own functions like lag, lead on Julia's Array. I'm also planing to make TimeSeries.jl exporting some utilities for Julia Array.

xgdgsc commented 6 years ago

Thanks. Do you plan to support option to specify whether to read headers or not?

iblislin commented 6 years ago

oh, I can do it.

iblislin commented 6 years ago

I sent it as #358