JuliaHEP / UnROOT.jl

Native Julia I/O package to work with CERN ROOT files objects (TTree and RNTuple)
https://juliahep.github.io/UnROOT.jl/
MIT License
102 stars 17 forks source link

Reading a file with Branches and Leafs #323

Closed TheFibonacciEffect closed 7 months ago

TheFibonacciEffect commented 7 months ago

I originally posted this on the julia discourse https://discourse.julialang.org/t/reading-root-file-with-branches/112014. However I believe this might actually be a bug in the package so I am posting my issue here as well.

Here is a link to the file I would like to read: https://polybox.ethz.ch/index.php/s/1aNZCiDSO1f2Eig

Original Post on julia Discourse

This is what I had originaly posted on the julia discourse

I have a root file with a tree, branches and leaves: When I try to read it like using UnROOT like this:

using UnROOT
f = ROOTFile("out_files/pienu00000-00.root")

tree = LazyTree(f,"sim")

I get a bounds error:

ERROR: BoundsError: attempt to access 7-element Vector{Any} at index [-1]
Stacktrace:
  [1] getindex(A::Vector{Any}, i1::Int64)
    @ Base ./essentials.jl:13
  [2] streamerfor(f::ROOTFile, branch::UnROOT.TBranchElement_10)
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/root.jl:160
  [3] UnROOT.JaggType(f::ROOTFile, branch::UnROOT.TBranchElement_10, leaf::UnROOT.TLeafElement)
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/utils.jl:64
  [4] auto_T_JaggT(f::ROOTFile, branch::UnROOT.TBranchElement_10; customstructs::Dict{String, Type})
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/root.jl:365
  [5] auto_T_JaggT
    @ ~/.julia/packages/UnROOT/PnXmk/src/root.jl:360 [inlined]
  [6] LazyBranch(f::ROOTFile, b::UnROOT.TBranchElement_10)
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:117
  [7] LazyBranch(f::ROOTFile, s::String)
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:134
  [8] LazyTree(f::ROOTFile, tree::UnROOT.TTree, treepath::String, branches::Vector{String}; sink::Type{LazyTree})
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:450
  [9] LazyTree
    @ ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:432 [inlined]
 [10] LazyTree(f::ROOTFile, s::String, branches::Vector{String}; kwargs::@Kwargs{})
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:393
 [11] LazyTree(f::ROOTFile, s::String, branches::Vector{String})
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:390
 [12] LazyTree(f::ROOTFile, s::String; kwargs::@Kwargs{})
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:461
 [13] LazyTree(f::ROOTFile, s::String)
    @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:460
 [14] top-level scope
    @ /popos/home/caspar/Documents-old/code/semesterprojekt-pioneer/analysis/analyze.jl:4

However reading root files that do not have branches works just fine.

This is how the file looks like within unroot:

julia> f
ROOTFile with 2 entries and 27 streamers.
out_files/pienu00000-00.root
├─ sim (TTree)
│  ├─ "info"
│  ├─ "init"
│  ├─ "track"
│  ├─ "decay"
│  ├─ "ghost"
│  ├─ "ghostface"
│  └─ "upstream"
└─ PIMCRunHeader (PIMCRunHeader)

In C I can read it by branch.leaf:

    events = (TTree*)fp->Get("sim");
    unsigned int nEvents = events -> GetEntriesFast();
    int pions = events -> GetEntries("ghost.pdgid==211");

So I also tried using tree = LazyTree(f,"sim", "ghost.pdgid")

But I get another error:

ERROR: MethodError: no method matching LazyBranch(::ROOTFile, ::Missing)

Closest candidates are:
  LazyBranch(::ROOTFile, ::Union{UnROOT.TBranch, UnROOT.TBranchElement})
   @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:116
  LazyBranch(::ROOTFile, ::AbstractString)
   @ UnROOT ~/.julia/packages/UnROOT/PnXmk/src/iteration.jl:134

While it works fine when having a flat root file:

f = ROOTFile("data/data.root")
tree = LazyTree(f,"events","NJet")

Method where the error occurs

The error occurs here, from my understanding I have a branch with branch.fID < -1

streamerfor(f::ROOTFile, branch::TBranch) = missing
function streamerfor(f::ROOTFile, branch::TBranchElement)
    fID = branch.fID
    # According to ChatGPt: When fID is equal to -1, it means that the
    # TBranch object has not been registered yet in the TTree's list of
    # branches. This can happen, for example, when a TBranch object has been
    # created, but has not been added to a TTree with the TTree::Branch()
    # method.
    #
    # TODO: For now, we force it to be 0 in this case, until someone complains.
    if fID == -1
        fID = 0
    end
    next_streamer = streamerfor(f, branch.fClassName)
    if ismissing(next_streamer)
        return missing
    else
        return next_streamer.streamer.fElements.elements[fID + 1]  # one-based indexing in Julia
    end
end

Which is located at ~/.julia/dev/UnROOT/src/root.jl

tamasgal commented 7 months ago

Just a very quick answer with a workaround, to allow you to proceed with your research. You can use the array() function to load the full data into memory. Since your file is small, laziness is not really needed, but your mileage may vary ;)

Anyways this is an example:

julia> f = ROOTFile("/Users/tamasgal/Downloads/pienu00002-02.root")
ROOTFile with 2 entries and 27 streamers.
/Users/tamasgal/Downloads/pienu00002-02.root
├─ sim (TTree)
│  ├─ "info"
│  ├─ "init"
│  ├─ "track"
│  ├─ "decay"
│  ├─ "ghost"
│  ├─ "ghostface"
│  └─ "upstream"
└─ PIMCRunHeader (PIMCRunHeader)

julia> UnROOT.array(f, "sim/ghost/ghost.pdgid")
1200-element ArraysOfArrays.VectorOfVectors{Int32, Vector{Int32}, Vector{Int32}, Vector{Tuple{}}}:
 Int32[-14, 14]
 Int32[14]
 Int32[-14, 14]
 Int32[14]
 0-element view(::Vector{Int32}, 7:6) with eltype Int32
 Int32[-14]
 Int32[211]

I will have a closer look at the file. LazyTree and LazyBranch are sometimes a bit hard to use with nested data.

tamasgal commented 7 months ago

OK, the first problem is that you are using the wrong path to the subbranch. I don't blame you, this is just something which is probably (read obviously) not very well documented ;)

If the structure is the following:

julia> f["sim/ghost"]
ghost
├─ ghost.fUniqueID
├─ ghost.fBits
├─ ghost.ghostID
├─ ghost.trackID
├─ ghost.stepID
├─ ghost.pdgid
├─ ghost.xpos
├─ ghost.ypos
├─ ghost.zpos
├─ ghost.time
├─ ghost.xmom
├─ ghost.ymom
└─ ghost.zmom

the path to the subbranch is "sim/ghost/ghost.pdgid". A LazyTree would then be

LazyBranch(f, "sim", ["sim/ghost/ghost.pdgid"])

or using a regex:

julia> LazyTree(f, "sim", [r"ghost/ghost\.(.*)" => s"\1"])

However (now comes the real problem), you will face the following error:

ERROR: MethodError: no method matching UInt32()

Closest candidates are:
  UInt32(::HTTP.WebSockets.Mask)
   @ HTTP ~/.julia/packages/HTTP/1EWL3/src/WebSockets.jl:62
  UInt32(::Char)
   @ Base char.jl:127
  UInt32(::Float64)
   @ Base float.jl:884
  ...

Stacktrace:
 [1] LazyBranch(f::ROOTFile, b::UnROOT.TBranchElement_10)
   @ UnROOT ~/Dev/UnROOT.jl/src/iteration.jl:124)

which is there because we never had a flat array and we use VectorOfVectors and do not cover the case for a flat array 🙈

I'll try to provide a fix asap.

tamasgal commented 7 months ago

Fix on the way in https://github.com/JuliaHEP/UnROOT.jl/pull/324

tamasgal commented 7 months ago

UnROOT 0.10.26 will allow you

julia> t = LazyTree(f, "sim", [r"ghost/ghost\.(.*)" => s"\1"])
 Row │ ypos             ghostID          time             stepID           fUniqueID        xmom             pd ⋯
     │ SubArray{Float3  SubArray{Int32,  SubArray{Float3  SubArray{Int32,  SubArray{UInt32  SubArray{Float3  Su ⋯
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────
 1   │ [982.0, 6.83]    [110000, 11000   [3230.0, 36.4]   [0, 0]           [0, 0]           [-25.6, -29.5]   [- ⋯
 2   │ [2590.0]         [110000]         [29.7]           [0]              [0]              [21.9]           [1 ⋯
 3   │ [258.0, -2230.   [110000, 11000   [774.0, 48.0]    [0, 0]           [0, 0]           [-44.3, 22.1]    [- ⋯
 4   │ [-1850.0]        [110000]         [60.6]           [0]              [0]              [14.1]           [1 ⋯
 5   │ []               []               []               []               []               []               [] ⋯
 6   │ [512.0]          [110000]         [587.0]          [0]              [0]              [-11.5]          [- ⋯
 7   │ [-11.0]          [110000]         [40.8]           [0]              [0]              [-2.61]          [2 ⋯
 8   │ [2890.0]         [110000]         [28.7]           [0]              [0]              [-8.56]          [1 ⋯
 9   │ []               []               []               []               []               []               [] ⋯
 10  │ [28.1]           [110000]         [41.0]           [0]              [0]              [-0.453]         [2 ⋯
 11  │ [2660.0]         [110000]         [799.0]          [0]              [0]              [-4.84]          [- ⋯
 12  │ [12.0]           [110000]         [41.0]           [0]              [0]              [2.28]           [2 ⋯
 13  │ [33.0]           [110000]         [41.8]           [0]              [0]              [-4.04]          [2 ⋯
 14  │ [-0.84]          [110000]         [40.5]           [0]              [0]              [-5.09]          [2 ⋯
 15  │ [664.0]          [110000]         [311.0]          [0]              [0]              [-3.08]          [1 ⋯
 16  │ []               []               []               []               []               []               [] ⋯
 17  │ [-1220.0]        [110000]         [640.0]          [0]              [0]              [7.4]            [1 ⋯
 18  │ [-2290.0, -269   [110000, 11000   [1410.0, 1410.   [0, 0, 0]        [0, 0, 0]        [0.0214, 0.01,   [2 ⋯
 19  │ [906.0]          [110000]         [8140.0]         [0]              [0]              [41.8]           [- ⋯
 20  │ [3430.0]         [110000]         [4960.0]         [0]              [0]              [-13.3]          [- ⋯
 21  │ []               []               []               []               []               []               [] ⋯
 22  │ [5.71]           [110000]         [41.0]           [0]              [0]              [6.6]            [2 ⋯
 23  │ []               []               []               []               []               []               [] ⋯
 24  │ [1330.0]         [110000]         [1250.0]         [0]              [0]              [-24.4]          [1 ⋯
 25  │ [-1050.0]        [110000]         [23.5]           [0]              [0]              [10.1]           [1 ⋯
 26  │ [-280.0, -2400   [110000, 11000   [1150.0, 39.1]   [0, 0]           [0, 0]           [0.412, -13.6]   [2 ⋯
 27  │ []               []               []               []               []               []               [] ⋯
 28  │ [-12.8]          [110000]         [1150.0]         [0]              [0]              [10.7]           [- ⋯
 29  │ [2770.0, 2080.   [110000, 11000   [1610.0, 20.5]   [0, 0]           [0, 0]           [20.8, -15.4]    [- ⋯
 30  │ [262.0, -866.0   [110000, 11000   [36.3, 35.7]     [0, 0]           [0, 0]           [-17.5, 5.21]    [1 ⋯
 31  │ []               []               []               []               []               []               [] ⋯
  ⋮  │        ⋮                ⋮                ⋮                ⋮                ⋮                ⋮            ⋱
                                                                                  7 columns and 1169 rows omitted
TheFibonacciEffect commented 7 months ago

Thank you so much for the timely response! Even on a saturday, I appreciate that a lot.

tamasgal commented 7 months ago

Science never sleeps ;)

Let me know if you have other issues. The next weeks will be very busy for me though 🙈

Moelf commented 7 months ago

huh, I didn't realize but the test file is kinda of large:

 test/samples/issue323.root | Bin 0 -> 5890205 bytes

is it possible to reduce it

TheFibonacciEffect commented 7 months ago

Sure there you go: https://polybox.ethz.ch/index.php/s/PcYbFyA5WO2Txvt

Moelf commented 7 months ago

@TheFibonacciEffect that file seems to be empty:

julia> LazyTree("./test/samples/issue323_small.root", "sim", [r"ghost/ghost\.(.*)"])
 Row │ ghost_ymom       ghost_fBits      ghost_pdgid      ghost_trackID    ghost_ghostID    ghost_time       ghost_xmom       ghost_xpos       ghost_zmom      ⋯
     │ SubArray{Float3  SubArray{UInt32  SubArray{Int32,  SubArray{Int32,  SubArray{Int32,  SubArray{Float3  SubArray{Float3  SubArray{Float3  SubArray{Float3 ⋯
─────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
                                                                                                                                               4 columns omitted
TheFibonacciEffect commented 7 months ago

@Moelf Sorry about that, this is also a way to solve large filesizes I guess 😅 Here this file should work: https://polybox.ethz.ch/index.php/s/i6KKEMt8A7UtVM0

julia> LazyTree("./issue323_small.root", "sim", [r"ghost/ghost\.(.*)"])
 Row │ ghost_ymom                   ghost_fBits                  ghost_pdgid                  ghost_trackID                ghost_ghostID                ghost_ ⋯
     │ SubArray{Float3              SubArray{UInt32              SubArray{Int32,              SubArray{Int32,              SubArray{Int32,              SubArr ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 1   │ [-3.08, -3.08, -3.08, -3.08  [50331648, 50331648, 503316  [-13, -13, -13, -13, -13, -  [3, 3, 3, 3, 3, 3, 3, 3, 3,  [110000, 110001, 110002, 11  [21.7, ⋯
 2   │ [-2.92, -2.92, -2.92, -2.92  [50331648, 50331648, 503316  [211, 211, 211, 211, 211, 2  [1, 1, 1, 1, 1, 1, 1, 1, 1,  [110000, 110001, 110002, 11  [22.8, ⋯
 3   │ [0.417, 0.417, 0.417, 0.417  [50331648, 50331648, 503316  [211, 211, 211, 211, 211, 2  [1, 1, 1, 1, 1, 1, 1, 1, 1,  [110000, 110001, 110002, 11  [21.8, ⋯
 4   │ [-10.8, -10.8, -10.8, -10.8  [50331648, 50331648, 503316  [-14, -14, -14, -14, -14, -  [4, 4, 4, 4, 4, 4, 4, 4, 4,  [110000, 110001, 110002, 11  [10100 ⋯
 5   │ [-21.3, -21.3, -21.3, -21.3  [50331648, 50331648, 503316  [14, 14, 14, 14, 14, 14, 14  [2, 2, 2, 2, 2, 2, 2, 2, 2,  [110000, 110001, 110002, 11  [13.2, ⋯
 6   │ [-15.2, -15.2, -15.2, -15.2  [50331648, 50331648, 503316  [211, 211, 211, 211, 211, 2  [1, 1, 1, 1, 1, 1, 1, 1, 1,  [110000, 110001, 110002, 11  [22.3, ⋯
 7   │ [2.34, 2.34, 2.34, 2.34, 2.  [50331648, 50331648, 503316  [-14, -14, -14, -14, -14, -  [4, 4, 4, 4, 4, 4, 4, 4, 4,  [110000, 110001, 110002, 11  [1140. ⋯
 8   │ [28.6, 28.6, 28.6, 28.6, 28  [50331648, 50331648, 503316  [14, 14, 14, 14, 14, 14, 14  [2, 2, 2, 2, 2, 2, 2, 2, 2,  [110000, 110001, 110002, 11  [20.3, ⋯
 9   │ [9.87, 9.87, 9.87, 9.87, 9.  [50331648, 50331648, 503316  [211, 211, 211, 211, 211, 2  [1, 1, 1, 1, 1, 1, 1, 1, 1,  [110000, 110001, 110002, 11  [22.1, ⋯
 10  │ [10.3, 10.3, 10.3, 10.3, 10  [50331648, 50331648, 503316  [14, 14, 14, 14, 14, 14, 14  [2, 2, 2, 2, 2, 2, 2, 2, 2,  [110000, 110001, 110002, 11  [10.3, ⋯
 11  │ [-24.1, -24.1, -24.1, -24.1  [50331648, 50331648, 503316  [14, 14, 14, 14, 14, 14, 14  [2, 2, 2, 2, 2, 2, 2, 2, 2,  [110000, 110001, 110002, 11  [15.0, ⋯
  ⋮  │              ⋮                            ⋮                            ⋮                            ⋮                            ⋮                      ⋱                                                                                                                               8 columns and 113 rows omitted

Thanks again for helping me out with the issue : )

tamasgal commented 7 months ago

Oh shoot, that's my bad, I misread the file size for whatever reason (thought it's 500kB) 🙈