Open peremato opened 10 months ago
@tamasgal this thing hits fID equals -2, I think we're missing something fundamental here
Actually the only missing thing in this case is the leaf type support for vector<unsigned int>
(see https://github.com/JuliaHEP/UnROOT.jl/pull/299). I should have added those, so you can blame me ;) The vector<string>
stuff is already supported. You don't need a custom streamer.
With https://github.com/JuliaHEP/UnROOT.jl/pull/299 the following works (without, you will fail reading the m_collectionIDs
part:
julia> using UnROOT
julia> f = ROOTFile("/Users/tamasgal/Downloads/Output_REC.root")
ROOTFile with 3 entries and 51 streamers.
/Users/tamasgal/Downloads/Output_REC.root
├─ runs (TTree)
│ └─ "PARAMETERS"
├─ events (TTree)
│ ├─ "AllCaloHitContributionsCombined"
│ ├─ "_AllCaloHitContributionsCombined_particle"
│ ├─ "BeamCal_Hits"
│ ├─ "⋮"
│ ├─ "YokeEndcapCollection"
│ ├─ "_YokeEndcapCollection_contributions"
│ └─ "PARAMETERS"
└─ podio_metadata (TTree)
├─ "events___idTable"
├─ "events___CollectionTypeInfo"
├─ "runs___idTable"
├─ "runs___CollectionTypeInfo"
├─ "PodioBuildVersion"
└─ "EDMDefinitions"
julia> LazyBranch(f, "podio_metadata/events___idTable/m_names")
1-element LazyBranch{SubArray{String, 1, Vector{String}, Tuple{UnitRange{Int64}}, true}, UnROOT.Offsetjagg, ArraysOfArrays.VectorOfVectors{String, Vector{String}, Vector{Int32}, Vector{Tuple{}}}}:
["AllCaloHitContributionsCombined", "EventHeader", "BeamCalClusters", "BeamCalClusters_particleIDs", "BeamCalCollection", "BeamCalRecoParticles", "BeamCalRecoParticles_particleIDs", "BeamCal_Hits", "BuildUpVertices", "BuildUpVertices_RP" … "TightSelectedPandoraPFOs", "InnerTrackerBarrelHitsRelations", "InnerTrackerEndcapHitsRelations", "OuterTrackerBarrelHitsRelations", "OuterTrackerEndcapHitsRelations", "RefinedVertexJets_rel", "RelationCaloHit", "RelationMuonHit", "VXDEndcapTrackerHitRelations", "VXDTrackerHitRelations"]
julia> LazyBranch(f, "podio_metadata/events___idTable/m_collectionIDs")
1-element LazyBranch{SubArray{UInt32, 1, Vector{UInt32}, Tuple{UnitRange{Int64}}, true}, UnROOT.Offsetjagg, ArraysOfArrays.VectorOfVectors{UInt32, Vector{UInt32}, Vector{Int32}, Vector{Tuple{}}}}:
UInt32[0x3a25675d, 0xd793ab91, 0xf0d073dd, 0x1d19206c, 0xc298a348, 0xc29370d2, 0x3954b563, 0xd2b19e7b, 0xfd03f5d0, 0x310a0f04 … 0x5fa7cf93, 0x029be193, 0x743732ae, 0xc42bbbee, 0xd1211017, 0x8dac6bb6, 0x603a5016, 0xdf24625a, 0xbb4cff22, 0x178c9330]
julia> LazyTree(f, "podio_metadata", [Regex("events___idTable/(.*)") => s"\1"])
Row │ m_names m_collectionIDs ⋯ │ SubArray{String SubArray{UInt32 ⋯─────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ ["AllCaloHitContributionsCombined", "EventHeader", "BeamC [975529821, 3616779153, 4040192989, 488185964, ⋯ 1 column omitted
Fixed in v0.10.21
.
@peremato let me know if it works for you.
Btw. just a little bit of clarification: the custom parsing always applies to a branch and not a tree (or set of branches). It's usually needed when the split-level is low (so that one needs to deserialise compound structures) or if the type for a specific branch is simply not supported.
huh, I don't know why this doesn't error due to fID== -2
, maybe because custom struct logic doesn't hit that?
How did you get the fID == -2
bubble up? Sorry for my ignorance, I have not looked closely enough 😆
Ah I see:
julia> UnROOT.LazyTree(f, "podio_metadata", ["events___idTable"])
fID = -2 # <- added a @show here...
ERROR: BoundsError: attempt to access 2-element Vector{Any} at index [-1]
Stacktrace:
[1] getindex(A::Vector{Any}, i1::Int64)
@ Base ./essentials.jl:13
[2] streamerfor(f::ROOTFile, branch::UnROOT.TBranchElement_10)
@ UnROOT ~/Dev/UnROOT.jl/src/root.jl:161
Yes, that negative fID
is weird. I have some notes on it but I have no solution yet.
EDIT: and yes, if you go to the deepest split level and there is an interpretation (like the one for vector<unsigned int>
) you will not hit the logic with the fID
In this case the UnROOT.streamerfor
needs to figure out the parser logic from the actual streamer, which is there, but fails due to the lookup. The lookup in this case is not index based (on fID
) but can be retrieved via the fName
. (below I also printed the available streamers).
It all boils down to take the automatic parser generation into this level so that it works without using the split-branches.
julia> UnROOT.streamerfor(f, "podio::CollectionIDTable")
e.streamer.fName = "TObject"
e.streamer.fName = "TCollection"
e.streamer.fName = "podio::GenericParameters"
e.streamer.fName = "pair<string,vector<int> >"
e.streamer.fName = "pair<string,vector<float> >"
e.streamer.fName = "pair<string,vector<string> >"
e.streamer.fName = "pair<string,vector<double> >"
e.streamer.fName = "vector<int>"
e.streamer.fName = "vector<float>"
e.streamer.fName = "edm4hep::CaloHitContributionData"
e.streamer.fName = "edm4hep::Vector3f"
e.streamer.fName = "podio::ObjectID"
e.streamer.fName = "edm4hep::CalorimeterHitData"
e.streamer.fName = "edm4hep::ClusterData"
e.streamer.fName = "edm4hep::ParticleIDData"
e.streamer.fName = "edm4hep::SimCalorimeterHitData"
e.streamer.fName = "edm4hep::ReconstructedParticleData"
e.streamer.fName = "edm4hep::VertexData"
e.streamer.fName = "edm4hep::EventHeaderData"
e.streamer.fName = "edm4hep::SimTrackerHitData"
e.streamer.fName = "edm4hep::Vector3d"
e.streamer.fName = "edm4hep::MCRecoTrackerHitPlaneAssociationData"
e.streamer.fName = "edm4hep::TrackerHitPlaneData"
e.streamer.fName = "edm4hep::Vector2f"
e.streamer.fName = "edm4hep::ObjectID"
e.streamer.fName = "edm4hep::MCParticleData"
e.streamer.fName = "edm4hep::Vector2i"
e.streamer.fName = "edm4hep::RecoParticleVertexAssociationData"
e.streamer.fName = "edm4hep::MCRecoCaloAssociationData"
e.streamer.fName = "edm4hep::TrackData"
e.streamer.fName = "edm4hep::TrackState"
e.streamer.fName = "edm4hep::Quantity"
e.streamer.fName = "podio::CollectionIDTable"
UnROOT.StreamerInfo(UnROOT.TStreamerInfo{UnROOT.TObjArray}("podio::CollectionIDTable", "", 0xe9251d6f, 1, UnROOT.TObjArray("", 0, Any[UnROOT.TStreamerSTL
version: UInt16 0x0004
fOffset: Int64 0
fName: String "m_collectionIDs"
fTitle: String ""
fType: Int32 500
fSize: Int32 24
fArrayLength: Int32 0
fArrayDim: Int32 0
fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
fTypeName: String "vector<unsigned int>"
fXmin: Float64 0.0
fXmax: Float64 0.0
fFactor: Float64 0.0
fSTLtype: Int32 1
fCtype: Int32 13
, UnROOT.TStreamerSTL
version: UInt16 0x0004
fOffset: Int64 0
fName: String "m_names"
fTitle: String ""
fType: Int32 500
fSize: Int32 24
fArrayLength: Int32 0
fArrayDim: Int32 0
fMaxIndex: Array{Int32}((5,)) Int32[0, 0, 0, 0, 0]
fTypeName: String "vector<string>"
fXmin: Float64 0.0
fXmax: Float64 0.0
fFactor: Float64 0.0
fSTLtype: Int32 1
fCtype: Int32 61
])), Set{Any}())
I need to study what uproot
is doing with the negative fID
, since it's able to get this right:
>>> import uproot
>>> f = uproot.open("/Users/tamasgal/Downloads/Output_REC.root")
>>> f["podio_metadata/events___idTable"]
<TBranchElement 'events___idTable' (2 subbranches) at 0x00010b58eb20>
>>> f["podio_metadata/events___idTable"].array()
<Array [{m_collectionIDs: [...], ...}] type='1 * {m_collectionIDs: var * ui...'>
yeah, from my very quick look, uproot
does not do anything with fID explicitly
Yes... I mean, obviously the information is sitting right in front of us ;) So in that case UnROOT
should create the corresponding struct
and add a readtype
or whatever dynamically. That's what's missing.
It's just a bit weird that this works fine in so many cases 😆 :
https://github.com/JuliaHEP/UnROOT.jl/blob/77b75d8f8a7d5a6a2b8c408efbcac2c00817e798/src/root.jl#L160
Fixed in
v0.10.21
.@peremato let me know if it works for you.
Btw. just a little bit of clarification: the custom parsing always applies to a branch and not a tree (or set of branches). It's usually needed when the split-level is low (so that one needs to deserialise compound structures) or if the type for a specific branch is simply not supported.
First, thanks very much @tamasgal. It works great once you know how to do it.
It is very confusing still for me the way to select the branches and leaves (perhaps is a lack of proper documentation or pre-knowledge of the ROOT file organisation). This works nicely:
ulia> meta = UnROOT.LazyTree(tfile, "podio_metadata", [Regex("events___idTable/(.*)") => s"\1"])
Row │ m_names m_collectionIDs ⋯
│ SubArray{String SubArray{UInt32 ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ ["AllCaloHitContributionsCombined", "EventHeader", "BeamCalClusters", "BeamCalClusters_particleIDs", "B [975529821, 3616779153, 4040192989, 488185964, ⋯
1 column omitted
but what I would do naively does not
julia> meta = UnROOT.LazyTree(tfile, "podio_metadata", ["m_names", "m_collectionIDs"])
ERROR: MethodError: no method matching LazyBranch(::ROOTFile, ::Missing)
Closest candidates are:
LazyBranch(::ROOTFile, ::AbstractString)
@ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:134
LazyBranch(::ROOTFile, ::Union{UnROOT.TBranch, UnROOT.TBranchElement})
@ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:116
Stacktrace:
[1] LazyBranch(f::ROOTFile, s::String)
@ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:134
[2] LazyTree(f::ROOTFile, tree::UnROOT.TTree, treepath::String, branches::Vector{String}; sink::Type{LazyTree})
@ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:450
[3] LazyTree
@ ~/Development/UnROOT.jl/src/iteration.jl:432 [inlined]
[4] LazyTree(f::ROOTFile, s::String, branches::Vector{String}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:393
[5] LazyTree(f::ROOTFile, s::String, branches::Vector{String})
@ UnROOT ~/Development/UnROOT.jl/src/iteration.jl:390
[6] top-level scope
@ REPL[6]:1
the flowing works but the names of the columns are wrong
julia> meta = UnROOT.LazyTree(tfile, "podio_metadata", ["events___idTable/m_names", "events___idTable/m_collectionIDs"])
Row │ events___idTabl events___idTabl ⋯
│ SubArray{UInt32 SubArray{String ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ [975529821, 3616779153, 4040192989, 488185964, 3264783176, 3264442578, 961852771, 3534855803, 424489518 ["AllCaloHitContributionsCombined", "EventHead ⋯
1 column omitted
I did also try the naming convention that was used for the other tree "events" with <branch>_<leaf>
but also does not work. I see that for the LazyBranch the convention is <branch>/<leaf>
. Overall is very confusing.
Yes, the problem is indeed that you need to know a little bit about the ROOT structure's subtleties. As you can see, uproot
also requires you to point to events___idTable
but then does the automatic RecArrat
-creation from the sub-branches. This is of course something I'd like to have in UnROOT
as well but it requires a lot of restructuring. As always, you learn ROOT iteratively and early design decisions need to be changed quite often (I had so many iterations in UnROOT
already 😆 ).
I really hope that I will find a longer time slot (2-4 weeks) next year to spend a significant amount of time on refactoring UnROOT.
>>> import uproot
>>> f = uproot.open("/Users/tamasgal/Downloads/Output_REC.root")
>>> f["podio_metadata/events___idTable"]
<TBranchElement 'events___idTable' (2 subbranches) at 0x00010b58eb20>
>>> f["podio_metadata/events___idTable"].array()
<Array [{m_collectionIDs: [...], ...}] type='1 * {m_collectionIDs: var * ui...'>
Regarding the events
tree, you do the same, but also here you need to provide the full path to the sub-branches:
julia> LazyTree(f, "events", [r"BeamCal_Hits/BeamCal_Hits.*\.(\w+)$" => s"\1"])
Row │ time x energyError energy y ⋯
│ SubArray{Float3 SubArray{Float3 SubArray{Float3 SubArray{Float3 Sub ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ [] [] [] [] [] ⋯
2 │ [] [] [] [] [] ⋯
3 │ [] [] [] [] [] ⋯
4 │ [] [] [] [] [] ⋯
5 │ [] [] [] [] [] ⋯
6 │ [] [] [] [] [] ⋯
7 │ [] [] [] [] [] ⋯
8 │ [] [] [] [] [] ⋯
9 │ [] [] [] [] [] ⋯
10 │ [] [] [] [] [] ⋯
11 │ [] [] [] [] [] ⋯
12 │ [] [] [] [] [] ⋯
13 │ [0.0, 0.0, [-8.2, -8. [0.0, 0.0, [0.0267, 0 [63 ⋯
14 │ [] [] [] [] [] ⋯
15 │ [] [] [] [] [] ⋯
16 │ [] [] [] [] [] ⋯
17 │ [] [] [] [] [] ⋯
18 │ [] [] [] [] [] ⋯
19 │ [0.0, 0.0] [3.17, 3.2 [0.0, 0.0] [0.0305, 0 [-1 ⋯
20 │ [] [] [] [] [] ⋯
21 │ [] [] [] [] [] ⋯
22 │ [0.0, 0.0] [151.0, 15 [0.0, 0.0] [0.0128, 0 [-8 ⋯
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋱
4 columns and 3 rows omitted
I was not doing this. If I do
julia> events = LazyTree(f, "events", ["BeamCal_Hits"])
Row │ BeamCal_Hits_en BeamCal_Hits_ti BeamCal_Hits_en BeamCal_Hits_po BeamCal_Hits_po BeamCal_Hits_po ⋯
│ SubArray{Float3 SubArray{Float3 SubArray{Float3 SubArray{Float3 SubArray{Float3 SubArray{Float3 ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ [] [] [] [] [] [] ⋯
2 │ [] [] [] [] [] [] ⋯
3 │ [] [] [] [] [] [] ⋯
4 │ [] [] [] [] [] [] ⋯
5 │ [] [] [] [] [] [] ⋯
6 │ [] [] [] [] [] [] ⋯
7 │ [] [] [] [] [] [] ⋯
8 │ [] [] [] [] [] [] ⋯
9 │ [] [] [] [] [] [] ⋯
10 │ [] [] [] [] [] [] ⋯
11 │ [] [] [] [] [] [] ⋯
12 │ [] [] [] [] [] [] ⋯
13 │ [0.0, 0.0, 0.0, 0.0, 0.0, [0.0, 0.0, 0.0, 0.0, 0.0, [0.0267, 0.0214, 0.0853, [3290.0, 3290.0, 3290.0, [-8.2, -8.16, -1.92, 31.1 [63.1, 63.1, 66. ⋯
14 │ [] [] [] [] [] [] ⋯
15 │ [] [] [] [] [] [] ⋯
16 │ [] [] [] [] [] [] ⋯
17 │ [] [] [] [] [] [] ⋯
18 │ [] [] [] [] [] [] ⋯
19 │ [0.0, 0.0] [0.0, 0.0] [0.0305, 0.0754] [-3350.0, -3360.0] [3.17, 3.21] [-19.2, -19.2] ⋯
20 │ [] [] [] [] [] [] ⋯
21 │ [] [] [] [] [] [] ⋯
22 │ [0.0, 0.0] [0.0, 0.0] [0.0128, 0.00132] [3360.0, 3380.0] [151.0, 151.0] [-86.8, -86.8] ⋯
23 │ [0.0] [0.0] [2.02f-6] [3390.0] [-62.9] [61.3] ⋯
and the leaves get the name <branch>_<leaf>
ulia> names(events)
8-element Vector{String}:
"BeamCal_Hits_energyError"
"BeamCal_Hits_time"
"BeamCal_Hits_energy"
"BeamCal_Hits_position_z"
"BeamCal_Hits_position_x"
"BeamCal_Hits_position_y"
"BeamCal_Hits_cellID"
"BeamCal_Hits_type"
I mean, technically we can do this LazyTree creation on the fly automatically but I could not come up with a way which works reliably, especially with all those funny (read weird) namings and dot-madness. So eventually we need to ask the user to provide the regex to help UnROOT
make reasonable fieldnames like x
instead of BeamCal_Hits.position.x
which would anyways not be valid due to the dots, so it needs to be translated to BeamCal_Hits_position_x
or so, but notice here that BeamCal_Hits
is redundant, since the branch is already called like that. ROOT however still stores that with that prefix. BUT not always and I still don't know why. We have some logic in UnROOT which works quite OK but it will still give you funny names in many cases. That's why I introduced that regex-thing, which I highly abuse 😉 see here:
Actually that is basically the place where we would need to incorporate the original streamer which tells you how to name them and how the hierarchy is structures, but it's quite complex and UnROOT
then really would have to define those structs at runtime, which brings us to the...
...painful fact: if you let UnROOT
define the structs, you will not be able to use those types in your own analysis code explicitly. Which means that of course Julia will happily pass you the instances, and your function will eat those types as well and everything is fine (and type-stable) but you will not be able to restrict or use those types to utilise multiple dispatch features since they are created on the fly and attached to the UnROOT
namespace (that would technically be type piracy) and of course you will have to deal with dynamic dispatch all(?) the time.
That's why I kind of like the that we simply use LazyTree
, which is a highly parametric type, signalling that it's a universal thing (like a named tuple) but it allows you to hide your data in some container type and/or reinterpret it to your own own types. So we force to use a barrier in order to be able to make use of a solid type system. That's what I have shown in KM3io jl Making UnROOT jl comfortable for KM3NeT - Tamas Gal
On the other hand, you can of course provide your custom structs and make UnROOT
utilise those, so you have full control and maximum efficiency. That's also shown in the presentation above, but of course requires more understanding of the underlying structures.
I use both techniques with great performance.
I was not doing this. If I do
Yes that works too, if you are fine with the UnROOT naming ;)
Hi Tom. I agree we can do several things and hide the UnROOT level. I you want have a look at what I have been doing with EDM4hep.jl. I am mapping a simple Julia type (isbits) to a set of columns in the LazyTree within a StructArray in a recursive manner. This is very convenient and good performance for some use cases. There are some examples like ttbar_digits.jl
to illustrate what you can do. I have given a presentation this week to the team developing this event model. It is very encouraging.
EDM4hep root files store in a tree called
podio_metadata
an object of the typeThe following is a reproducer:
The test file can be downloaded from https://github.com/peremato/EDM4hep.jl/blob/main/examples/Output_REC.root