JuliaHEP / UnROOT.jl

Native Julia I/O package to work with CERN ROOT files objects (TTree and RNTuple)
https://juliahep.github.io/UnROOT.jl/
MIT License
96 stars 17 forks source link

Redesign version management of streamers #4

Open tamasgal opened 4 years ago

tamasgal commented 4 years ago

The current design of the streamer type hierarchy (including versioning) works in a way which I do not find nice at all. At some point I thought it would be a nice solution but now I see a lot of code repetitions.

A lot of streamer logic is currently implemented in bootstrap.jl and the overall implementation of a streamer and all of its versions is done like this:

abstract type TAttLine <: ROOTStreamedObject end
struct TAttLine_1 <: TAttLine end
function readfields!(io, fields, T::Type{TAttLine_1})
    fields[:fLineColor] = readtype(io, Int16)
    fields[:fLineStyle] = readtype(io, Int16)
    fields[:fLineWidth] = readtype(io, Int16)
end
struct TAttLine_2 <: TAttLine end
function readfields!(io, fields, T::Type{TAttLine_2})
    fields[:fLineColor] = readtype(io, Int16)
    fields[:fLineStyle] = readtype(io, Int16)
    fields[:fLineWidth] = readtype(io, Int16)
end

As seen above, the ROOTStreamedObject is the supertype (which is just a simple abstract type) and there is (always) one single subtype (another abstract type) representing a specific streamer.

Different versions are defined by appending _VERSION to the type name and creating an empty struct. The readfields! method then define how to read its fields.

Both versions of TAttLine (v1 and v2) have the same fields with the same types, so in this case it's just code repetition.

Note that this definitions should be created dynamically in future, but the overall layout is of course the same.

During the read-in of a ROOT file, UnROOT checks if there is a streamer with the given name and version (appended), like TAttLine_2 and if not, it will complain. (later it will define it from the streamer info).

A simplification of this system which allows a nice bootstrapping (no code repetition) and a better version management would be something like the following:

import Base: getproperty

abstract type ROOTStreamedObject end

function Base.getproperty(obj::ROOTStreamedObject, sym::Symbol)
    if !haskey(Core.getproperty(obj, :fields), sym)
        error("Type $(typeof(obj)) has no field $(String(sym))")
    end
    Core.getproperty(obj, :fields)[sym]
end

struct TAttLine{V} <: ROOTStreamedObject  # this should be a macro
    fields::Dict{Symbol, Any}
end

where the actual struct definition could of course be simplified via a macro like @rootstreamer TAttLine or so.

The readfields!() methods can then be defined using value type dispatch:

function readfields!(io, fields, ::Type{TAttLine{1})
    fields[:fLineColor] = readtype(io, Int16)
    fields[:fLineStyle] = readtype(io, Int16)
    fields[:fLineWidth] = readtype(io, Int16)
end

readfields!(io, fields, ::Type{TAttLine{2}) = readfields!(io, fields, ::Type{TAttLine{1})

Here is a short demo of the instantiation:

julia> tattline = TAttLine{1}(Dict(:fLineColor => 42, :fLineStyle => 5, :fLineWidth => 23))
TAttLine{1}(Dict{Symbol,Any}(:fLineWidth => 23,:fLineColor => 42,:fLineStyle => 5))

julia> tattline.fLineWidth
23

julia> tattline.fFoo
ERROR: Type TAttLine{1} has no field fFoo

This is just an idea, any feedback is highly appreciated ;)

Moelf commented 4 years ago

created dynamically (later it will define it from the streamer info)

I have time and started looking at the code base again. I'm wondering if it's possible to document an implementation-independent way in a Developers / Internals section of the docs so people can understand how conceptually it works.

Particularly, I am yet to understand how to make a struct for a streamer that we have never seen before, my guess is that the name of streamer and byte # for each field are encoded in TKey(?) or somewhere, but if that is the case does this mean we don't have to do any bootstrap except for the TKey(which is ~fixed) and the starting 37(or 75) bytes of the .root files?

tamasgal commented 3 years ago

Oh dear, I somehow forgot to answer this. I thought I did 🤔

Anyways, the construction of the struct and also the complete parsing of the structures is very complex. There is a Google group where I also participate in and we were already thinking about writing a book or so about the ROOT I/O to consolidate our knowledge. But I don't know about the time-scale, probably not this year...

So indeed, it's hard to answer your question as it's also depending on other fields in TKey, which have magic values, for example listed here: https://github.com/tamasgal/UnROOT.jl/blob/1d892e7b9159e5425ea91a6cc16dabc1caa09b77/src/constants.jl#L27

tamasgal commented 2 years ago

Worth to mention, a Macro posted by Chris Rackauckas on twitter:

macro def(name, definition)
  return quote
      macro $(esc(name))()
          esc($(Expr(:quote, definition)))
      end
  end
end

@def myfields
   x::Int
   y::Int
end

struct A
  @myfields
  z::Int
end

With this macro, anyone can compile-time paste code around.