JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
101 stars 18 forks source link

Integration with Tables.jl #355

Closed scls19fr closed 3 weeks ago

scls19fr commented 9 months ago

Hello,

I'd like to know if integration with Tables.jl https://tables.juliadata.org/dev/ have been considered to export a slice of an YAXArray to DataFrames.DataFrame, TimeSeries.TimeArray, TSFrames.TSFrame... Maybe YAXArray could be both a source and a sink. Any opinion ?

Kind regards

lazarusA commented 9 months ago

it looks like is already supported https://rafaqz.github.io/DimensionalData.jl/dev/reference/?h=dimtable#tablesjltabletraitsjl-interface, maybe we could just tried out with some examples, and if it works add them to the docs? What simple examples do you have in mind?

scls19fr commented 9 months ago

I see two kind of example.

YAXArray as sink Download 3 symbols data from MarketData.jl (for example) and get a "cube".

YAXArray as source Take the previously obtained cube, swap 2 dimensions and get a DataFrame ohlcv at a given date, get a TSFrame of close prices with symbol as column...

This lib shouldn't be added to YAXArray so you will probably have to deal with package extensions https://youtu.be/TiIZlQhFzyk?si=Lvm6RSp3WjuqtV-o

An other idea if you don't want to rely on remote data could be to generate similar data with a random walk.

femtotrader commented 5 months ago

Here is some random data to build a 3D cube

julia> using MarketData

julia> data = Dict("Stock1" => random_ohlcv(), "Stock2" => random_ohlcv(), "Stock3" => random_ohlcv())
Dict{String, TimeArray{Float64, 2, DateTime, Matrix{Float64}}} with 3 entries:
  "Stock2" => 500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
  "Stock3" => 500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
  "Stock1" => 500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00

julia> data["Stock1"]
500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
┌─────────────────────┬────────┬────────┬────────┬────────┬────────┐
│                     │ Open   │ High   │ Low    │ Close  │ Volume │
├─────────────────────┼────────┼────────┼────────┼────────┼────────┤
│ 2020-01-01T00:00:00 │ 654.02 │ 657.91 │ 652.74 │ 657.91 │   47.8 │
│ 2020-01-01T01:00:00 │ 657.59 │ 663.22 │ 656.93 │ 658.29 │   55.2 │
│ 2020-01-01T02:00:00 │ 658.09 │  662.2 │  649.3 │  649.3 │    3.7 │
│ 2020-01-01T03:00:00 │ 649.57 │ 649.57 │ 634.44 │ 636.65 │   13.9 │
│ 2020-01-01T04:00:00 │ 637.35 │ 639.31 │ 635.88 │ 635.88 │   35.8 │
│ 2020-01-01T05:00:00 │  635.6 │ 636.46 │ 626.38 │ 628.16 │   68.8 │
│ 2020-01-01T06:00:00 │ 627.61 │ 629.29 │ 622.35 │ 629.29 │   27.1 │
│ 2020-01-01T07:00:00 │ 630.18 │ 637.41 │ 630.18 │ 634.59 │   39.0 │
│ 2020-01-01T08:00:00 │ 634.84 │ 635.42 │ 626.56 │ 626.56 │   26.7 │
│ 2020-01-01T09:00:00 │ 625.98 │ 627.14 │ 622.37 │ 626.96 │    8.7 │
│ 2020-01-01T10:00:00 │ 627.76 │ 636.52 │ 627.67 │  634.8 │   79.7 │
│ 2020-01-01T11:00:00 │ 634.71 │ 635.36 │ 629.06 │ 629.65 │   70.6 │
│          ⋮          │   ⋮    │   ⋮    │   ⋮    │   ⋮    │   ⋮    │
│ 2020-01-21T08:00:00 │  793.7 │ 795.42 │ 785.97 │ 786.96 │   63.8 │
│ 2020-01-21T09:00:00 │ 787.38 │  791.3 │ 785.83 │ 785.83 │    0.0 │
│ 2020-01-21T10:00:00 │ 786.02 │ 793.74 │ 784.98 │ 793.74 │   71.2 │
│ 2020-01-21T11:00:00 │ 794.73 │ 795.11 │ 790.71 │ 790.71 │   76.3 │
│ 2020-01-21T12:00:00 │ 789.92 │ 790.87 │ 786.32 │ 787.38 │   42.7 │
│ 2020-01-21T13:00:00 │ 788.26 │ 788.33 │ 782.01 │ 782.48 │   61.6 │
│ 2020-01-21T14:00:00 │ 781.58 │ 782.98 │ 777.93 │ 782.13 │   31.2 │
│ 2020-01-21T15:00:00 │ 781.66 │ 782.95 │ 774.77 │ 779.68 │   44.5 │
│ 2020-01-21T16:00:00 │ 779.35 │ 784.95 │ 773.43 │ 784.95 │   34.2 │
│ 2020-01-21T17:00:00 │ 785.61 │ 789.73 │ 783.63 │  787.8 │   50.2 │
│ 2020-01-21T18:00:00 │ 787.51 │ 794.35 │ 787.37 │ 792.83 │    3.5 │
│ 2020-01-21T19:00:00 │ 792.87 │  794.0 │ 790.51 │ 793.18 │   16.9 │
└─────────────────────┴────────┴────────┴────────┴────────┴────────┘
                                                    476 rows omitted

julia> data["Stock2"]
500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
┌─────────────────────┬────────┬────────┬────────┬────────┬────────┐
│                     │ Open   │ High   │ Low    │ Close  │ Volume │
├─────────────────────┼────────┼────────┼────────┼────────┼────────┤
│ 2020-01-01T00:00:00 │  155.8 │ 167.25 │ 154.93 │ 165.42 │   40.8 │
│ 2020-01-01T01:00:00 │ 164.48 │ 167.51 │ 162.54 │ 165.19 │   29.5 │
│ 2020-01-01T02:00:00 │ 165.66 │ 171.29 │ 164.89 │ 165.11 │   55.0 │
│ 2020-01-01T03:00:00 │ 164.35 │ 169.62 │ 164.35 │ 165.48 │   13.2 │
│ 2020-01-01T04:00:00 │ 165.26 │ 168.44 │ 164.23 │ 165.34 │   97.3 │
│ 2020-01-01T05:00:00 │ 166.05 │ 171.79 │  166.0 │  170.8 │   62.7 │
│ 2020-01-01T06:00:00 │ 170.63 │ 174.14 │ 170.17 │ 174.02 │   66.8 │
│ 2020-01-01T07:00:00 │ 174.49 │ 179.76 │ 174.49 │ 178.54 │   40.5 │
│ 2020-01-01T08:00:00 │  177.8 │ 179.85 │ 175.84 │ 176.01 │   63.8 │
│ 2020-01-01T09:00:00 │ 176.92 │ 181.39 │ 174.55 │ 176.26 │   50.3 │
│ 2020-01-01T10:00:00 │ 175.69 │ 176.43 │ 171.21 │ 172.28 │   59.0 │
│ 2020-01-01T11:00:00 │ 172.14 │ 177.01 │ 168.63 │ 175.23 │   90.2 │
│          ⋮          │   ⋮    │   ⋮    │   ⋮    │   ⋮    │   ⋮    │
│ 2020-01-21T08:00:00 │  149.9 │ 151.54 │ 146.31 │ 150.34 │   98.0 │
│ 2020-01-21T09:00:00 │ 150.64 │ 151.86 │ 145.85 │ 148.63 │   89.7 │
│ 2020-01-21T10:00:00 │ 149.62 │ 152.04 │ 144.73 │ 149.19 │   87.3 │
│ 2020-01-21T11:00:00 │ 148.48 │ 150.29 │ 140.75 │ 141.65 │   35.2 │
│ 2020-01-21T12:00:00 │ 142.39 │ 142.39 │ 137.89 │ 142.14 │   47.5 │
│ 2020-01-21T13:00:00 │ 142.88 │ 151.71 │ 140.67 │ 150.35 │   67.1 │
│ 2020-01-21T14:00:00 │ 150.02 │ 152.85 │ 148.64 │ 150.31 │   12.8 │
│ 2020-01-21T15:00:00 │ 150.84 │ 157.52 │ 150.84 │ 156.68 │   29.6 │
│ 2020-01-21T16:00:00 │ 157.44 │ 165.22 │ 157.44 │ 163.09 │   74.6 │
│ 2020-01-21T17:00:00 │ 163.36 │ 167.37 │ 163.08 │ 165.92 │   56.6 │
│ 2020-01-21T18:00:00 │ 166.68 │ 174.08 │ 166.68 │ 171.58 │   22.0 │
│ 2020-01-21T19:00:00 │ 170.61 │ 174.85 │ 169.47 │ 171.41 │   29.6 │
└─────────────────────┴────────┴────────┴────────┴────────┴────────┘
                                                    476 rows omitted

julia> data["Stock3"]
500×5 TimeArray{Float64, 2, DateTime, Matrix{Float64}} 2020-01-01T00:00:00 to 2020-01-21T19:00:00
┌─────────────────────┬────────┬────────┬───────┬────────┬────────┐
│                     │ Open   │ High   │ Low   │ Close  │ Volume │
├─────────────────────┼────────┼────────┼───────┼────────┼────────┤
│ 2020-01-01T00:00:00 │  44.15 │  46.02 │ 40.92 │  44.89 │   24.8 │
│ 2020-01-01T01:00:00 │  45.06 │  50.57 │ 43.49 │  49.09 │   45.2 │
│ 2020-01-01T02:00:00 │  49.96 │  54.79 │ 48.06 │  53.76 │   21.9 │
│ 2020-01-01T03:00:00 │   53.2 │  59.82 │ 52.42 │  56.41 │    6.2 │
│ 2020-01-01T04:00:00 │  56.04 │  59.03 │ 53.74 │  54.75 │   92.3 │
│ 2020-01-01T05:00:00 │   54.8 │  56.29 │ 50.81 │  55.76 │   52.2 │
│ 2020-01-01T06:00:00 │  56.34 │   56.7 │ 52.95 │  53.04 │   72.6 │
│ 2020-01-01T07:00:00 │  52.87 │  53.49 │ 46.98 │  46.98 │   21.1 │
│ 2020-01-01T08:00:00 │  46.51 │  50.58 │ 44.67 │  49.95 │   52.5 │
│ 2020-01-01T09:00:00 │  49.37 │  49.68 │ 43.78 │  45.73 │   68.3 │
│ 2020-01-01T10:00:00 │  45.24 │  50.73 │ 45.24 │  50.73 │   45.9 │
│ 2020-01-01T11:00:00 │  51.21 │  53.11 │ 48.01 │  52.05 │   44.9 │
│          ⋮          │   ⋮    │   ⋮    │   ⋮   │   ⋮    │   ⋮    │
│ 2020-01-21T08:00:00 │  85.54 │   88.5 │ 84.51 │  86.84 │   91.9 │
│ 2020-01-21T09:00:00 │  86.63 │  86.63 │ 80.47 │  84.93 │   49.2 │
│ 2020-01-21T10:00:00 │   85.7 │  87.37 │ 79.86 │  80.99 │   59.1 │
│ 2020-01-21T11:00:00 │   81.5 │  83.25 │ 77.61 │  79.87 │   25.4 │
│ 2020-01-21T12:00:00 │  80.07 │  80.07 │ 74.48 │  74.48 │   65.7 │
│ 2020-01-21T13:00:00 │  74.04 │  76.15 │ 71.99 │   75.5 │   84.9 │
│ 2020-01-21T14:00:00 │  75.42 │  82.62 │ 75.42 │  78.98 │   35.5 │
│ 2020-01-21T15:00:00 │  78.84 │  80.16 │ 75.16 │  75.52 │   70.6 │
│ 2020-01-21T16:00:00 │  75.63 │  75.63 │ 70.72 │  73.43 │   46.1 │
│ 2020-01-21T17:00:00 │   73.1 │  75.34 │  71.0 │  71.77 │   14.9 │
│ 2020-01-21T18:00:00 │  72.43 │  74.53 │ 68.28 │  68.28 │   81.8 │
│ 2020-01-21T19:00:00 │  68.24 │  68.79 │ 63.75 │   67.1 │   96.2 │
└─────────────────────┴────────┴────────┴───────┴────────┴────────┘
                                                   476 rows omitted

Unfortunately I don't know how to get this into YAXArrays.jl

felixcremer commented 5 months ago

You could construct a YAXArray from every separate stock with this:

s = data["Stock1"]
julia> d = (Ti(timestamp(s)), Dim{:colnames}(colnames(s)))

julia> YAXArray(d, values(s));

This would construct a two dimensional YAXArray from the data in the TimeArray. If you would like to have a three dimensional YAXArray with a dimension for the stocks you could use cat(yaxlist, dims=Dim{:Stock}(["1", "2", "3"]) or you could use a Dataset which would behave more like a Dict and there you could have Arrays with different dimensions.

femtotrader commented 5 months ago
using YAXArrays
d = (Ti(timestamp(s)), Dim{:colnames}(colnames(s)))

is broken. It raises

ERROR: UndefVarError: `Ti` not defined
femtotrader commented 5 months ago
using DimensionalData: DimensionalData as DD

and using DD.Ti should help

felixcremer commented 5 months ago

Yes sorry, forgot the import of DD. Is this what you had in mind?

femtotrader commented 5 months ago

What I had is mind was to provide a full example like so

using MarketData
using DataStructures
using YAXArrays
using DimensionalData: DimensionalData as DD

d_data = OrderedDict("Stock1" => random_ohlcv(), "Stock2" => random_ohlcv(), "Stock3" => random_ohlcv())

yaxlist = YAXArray[]
for (stock, stock_data) in d_data
    d = (DD.Ti(timestamp(stock_data)), Dim{:colnames}(colnames(stock_data)))
    yax = YAXArray(d, values(stock_data))
    push!(yaxlist, yax)
end
data = cat(yaxlist, dims=Dim{:Stock}(keys(d_data)))

but last line is failing.

ERROR: MethodError: no method matching iterate(::Dim{:Stock, Base.KeySet{String, OrderedDict{String, TimeArray{Float64, 2, DateTime, Matrix{Float64}}}}})

Closest candidates are:
  iterate(::Base.AsyncGenerator, ::Base.AsyncGeneratorState)
   @ Base asyncmap.jl:362
  iterate(::Base.AsyncGenerator)
   @ Base asyncmap.jl:362
  iterate(::DataStructures.TrieIterator)
   @ DataStructures C:\Users\femto\.julia\packages\DataStructures\95DJa\src\trie.jl:112
  ...

same for

data = cat(yaxlist, dims=Dim{:Stock}(collect(keys(d_data))))
ERROR: MethodError: no method matching isless(::String, ::Int64)

Closest candidates are:
  isless(::Missing, ::Any)
   @ Base missing.jl:87
  isless(::Any, ::Missing)
   @ Base missing.jl:88
  isless(::ForwardDiff.Dual{Tx}, ::Integer) where Tx
   @ ForwardDiff C:\Users\femto\.julia\packages\ForwardDiff\PcZ48\src\dual.jl:144