JuliaHEP / UpROOT.jl

Julia package to access CERN ROOT files, wraps Python package uproot
Other
15 stars 3 forks source link

Recognition of TTree breaks when matrices are in branches #5

Closed mmikhasenko closed 4 years ago

mmikhasenko commented 4 years ago

My ROOT tree contains covariance matrices, the tree[1] breaks when Table is created

https://github.com/JuliaHEP/UpROOT.jl/blob/master/src/ttree.jl#L40

It happens because _ndims(nt) of TypedTables.jl is confused.

_ndims(::Type{<:Tuple{Vararg{AbstractArray{<:Any, n}}}}) where {n} = n

It wants that all object in the tuple are n-dims. arryas.

For some reason for matrices in branches pyobj(tree).arrays returns not an array of matrices, but 3d matrices.

https://github.com/JuliaHEP/UpROOT.jl/blob/master/src/ttree.jl#L39

oschulz commented 4 years ago

Argh ... well, this should be fixable. Could you get me a test file?

mmikhasenko commented 4 years ago

sure, sending you a link via email

mmikhasenko commented 4 years ago

Sent. There is a solution proposed in the PR #6. Maybe you can also figure out why reading TTree-s is so slow and how to make it faster.

mmikhasenko commented 4 years ago

Maybe you can also figure out why reading TTree-s is so slow and how to make it faster.

I noticed that iteration over a Table with many (~1000) columns is very slow.

One solution I find is to give an array of branches I am interested in as an argument of getindex

function Base.getindex(tree::TTree, idxs::AbstractUnitRange, columns::Vector{String})
    @boundscheck checkbounds(tree, idxs)
    cols = pyobj(tree).arrays(entrystart = first(idxs) - 1, entrystop = last(idxs))
    cols_filt = Dict([c=>cols[c] for c in columns])
    d2nt = _dict2nt(cols_filt)
    updated_d2nt = NamedTuple{keys(d2nt)}([ndims(v)==1 ? v : array_of_first_dim(v) for v in d2nt])
    Table(updated_d2nt)
end

Any other good ideas?

oschulz commented 4 years ago

Hi, sorry, been extremely busy, but haven't forgotten about this issue!

oschulz commented 4 years ago

I noticed that iteration over a Table with many (~1000) columns is very slow.

Indeed, in such cases it's best (actually in general) to only load the columns/branches of interest. Would be good to find have an elegant (view-like?) API that makes this very easy.

oschulz commented 4 years ago

Should be fixed by #8. @mmikhasenko could you let me know if the current master branch works for you? If so, I'll release a new minor version of UpROOT.jl.

oschulz commented 4 years ago

This should make things more convenient: #9

Will allow you to do

tree[1:5, (:branch1, :branch2)]
tree[:, [:branch1, :branch2]]
tree[[1,4,7,10], ["branch1", "branch2"]]
oschulz commented 4 years ago

9 is now merged and on master.

mmikhasenko commented 4 years ago

Great, thanks

oschulz commented 4 years ago

Ok, UpROOT v0.3.0 is released.