Open milktrader opened 7 years ago
Yes, I think this important functionality belongs in a separate package. Some other possible names ...
TimeSeriesTools
(this might be too general)TimeSeriesStreams
TimeSeriesIO
is actually not bad for a package name either.
The point of the DataStreams framework is that you wouldn't have to depend on DataFrames, just on DataStreams.jl, and you'd get support for streaming from/to any source, like DataFrame, CSV, databases, etc.
Why not have DataStreams.jl support TimeSeries, like it supports DataFrames?
DataFrames does not support DataStreams.jl
I still have some difficulties to understand functional differences between DataStreams.jl and IterableTables.jl
Maybe @davidanthoff and @quinnj can help for a better understanding
In terms of goals the two packages are super similar. IterableTables.jl emerged out of the design of Query.jl, where the design of IterableTables.jl (namely iterators of NamedTuples.jl) forms the core of the most common backend.
In terms of design, the main difference currently is that IterableTables.jl only has one way of streaming data, namely row by row (where each row is a named tuple). DataStreams.jl offers two and different options: you can either stream field by field or column by column.
There are more sinks and sources for IterableTables.jl currently (more than a dozen as of right now). In particular, if you implement the IterableTables.jl interface, you get automatic interop with the DataStreams.jl sources and sinks via their field based streaming (but not with the column by column streaming). One other difference is in the details of the integration with Query.jl: while you can query a DataStreams.jl source, you should generally get a smoother experience if you query a IterableTables.jl because there are less wrapper steps involved. Same if you materialize a query into some tabular structure.
There are also some user API differences that should be fairly obvious if you just look at the examples of how to use the two packages.
I don't think we have ever done a performance comparison between the two approaches.
This enhancement request (supporting DataStreams.jl) was initially submit by @nalimilan https://github.com/JuliaStats/TimeSeries.jl/issues/290#issuecomment-254007499
Pinging @quinnj Maybe you can help on this ?
Code to convert
DataFrame
toTimeArray
andTimeArray
toDataFrame
can be found here https://github.com/femtotrader/TimeSeriesIO.jl/blob/master/src/TimeSeriesIO.jlit could help to build a
TimeArray.Sink
.A
TimeArray.Source
(to convert fromTimeArray
toDataStream
) will be also a nice feature to have.If @milktrader doesn't want to add additional dependencies to TimeSeries.jl, this code can be part of TimeSeriesIO.jl
Related issues: