JuliaHEP / AwkwardArray.jl

Awkward Array in Julia mirrors the Python library, enabling effortless zero-copy data exchange between Julia and Python
https://juliahep.github.io/AwkwardArray.jl/dev/
MIT License
32 stars 2 forks source link

[feat] need better `to_layout` and `from_iter` design #35

Open Moelf opened 1 year ago

Moelf commented 1 year ago

I'm test driving using a function from AK as a sink to UnROOT:

julia> using UnROOT, AwkwardArray

julia> t = LazyTree(UnROOT.samplefile("NanoAODv5_sample.root"), "Events", r"Muon_(pt|eta|phi)$");

julia> t[1:10]
 Row │ Muon_phi         Muon_pt          Muon_eta
     │ Vector{Float32}  Vector{Float32}  Vector{Float32}
─────┼───────────────────────────────────────────────────
 1   │ []               []               []
 2   │ [-0.305, 0.99]   [19.9, 15.3]     [0.53, 0.229]
 3   │ []               []               []
 4   │ []               []               []
 5   │ []               []               []
 6   │ []               []               []
 7   │ [2.71, 1.37]     [22.2, 4.43]     [-1.13, 1.98]
 8   │ []               []               []
 9   │ []               []               []
 10  │ []               []               []

julia> AwkwardArray.from_iter([collect(t[i]) for i =1:10])
10-element AwkwardArray.RecordArray{(:Muon_phi, :Muon_pt, :Muon_eta), Tuple{AwkwardArray.ListOffsetArray{Vector{Int64}, AwkwardArray.PrimitiveArray{Float32, Vector{Float32}, :default}, :default}, AwkwardArray.ListOffsetArray{Vector{Int64}, AwkwardArray.PrimitiveArray{Float32, Vector{Float32}, :default}, :default}, AwkwardArray.ListOffsetArray{Vector{Int64}, AwkwardArray.PrimitiveArray{Float32, Vector{Float32}, :default}, :default}}, :default}:
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}
 {Muon_phi: [-0.30541992f0, ...], Muon_pt: [...], Muon_eta: [...]}
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}
 {Muon_phi: [2.713379f0, 1.3676758f0], Muon_pt: [...], Muon_eta: [...]}
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}
 {Muon_phi: [], Muon_pt: [], Muon_eta: []}

This works but is not very ergonomic and also involves copying everything twice.


Because LazyTree is already iterable, I tried to get the following to work

julia> AwkwardArray.from_iter(t)
# errors because `to_layout` doesn't understand `LazyEvent`

# so I implement this to say "LazyEvent is same as NamedTuple"
julia> function AwkwardArray.layout_for(x::Type{UnROOT.LazyEvent{T}}) where T
           AwkwardArray.layout_for(T)
       end

then I hit a infinite recursion:

ERROR: StackOverflowError:
Stacktrace:
      [1] push!(a::AwkwardArray.RecordArray{(:Muon_phi, :Muon_pt, :Muon_eta), Tuple{…}, :default}, iter::UnROOT.LazyEvent)
        @ Base ./array.jl:1186
      [2] append!
        @ AwkwardArray ~/.julia/dev/AwkwardArray/src/AwkwardArray.jl:150 [inlined]
--- the last 2 lines are repeated 52332 more times ---
 [104667] push!(a::AwkwardArray.RecordArray{(:Muon_phi, :Muon_pt, :Muon_eta), Tuple{…}, :default}, iter::UnROOT.LazyEvent)
        @ Base ./array.jl:1186
 [104668] from_iter(input::LazyTree with 3 branches:
Muon_phi, Muon_pt, Muon_eta
)
        @ AwkwardArray ~/.julia/dev/AwkwardArray/src/AwkwardArray.jl:2358
Some type information was truncated. Use `show(err)` to see complete types.

to me it all just means that:

What I really want to do here is to tell AwkwardArray:

Moelf commented 1 year ago

A possible implementation is to define

# need a better name
AwkwardArray.akconvert(x) = x #default no-op

and then define from_iter to something like:

function from_iter(input)
    ItemType = eltype(input)
    AwkwardType = layout_for(ItemType)
    out = AwkwardType()
    for item in input
        push!(out, Awkward.akconvert(item))
    end
    out
end

and the user-facing interface/contract for custom type is: if you have a custom type, you need to specialize two functions:

  1. AwkwardArray.layout_for(::Type{MyType}) which must return a type that is (eventually) recognized by Awkward
  2. AwkwardArray.akconvert(x::MyType) which must convert to the same type as returned by 1
jpivarski commented 1 year ago

I'm open to any such changes, even if they change the interface. Nothing is firmly established yet.