Closed soleti closed 4 years ago
uproot_methods
can read a small number of ROOT objects like TVector3
, which could be written to TTrees and read by uproot instead of XYZVec
or Hep3Vector
. (https://github.com/scikit-hep/uproot-methods/tree/master/uproot_methods/classes)
(otherwise I agree!)
HI Roberto,
Are you speaking about reading our art format event data files, TrkAna files, Stntuple files? All of the above? Something else?
Rob
On Apr 30, 2020, at 5:28 PM, Stefano Roberto Soleti notifications@github.com wrote:
The Python package uproot is becoming the de facto standard for converting TTrees into Numpy arrays or Pandas dataframes. Unfortunately, it is not able to read custom objects (and I think PyROOT also have this issue). In particular, I am referring to what happens e.g. in ComboHitDiag (but also other analyzers) where we store positions and directions as XYZVec. In my opinion, we should try to store this type of information as flat objects (for example three floats like pos_x, pos_y and pos_z) or fixed-size arrays (which can be read by uproot) in order to make TTrees as widely accessible as reasonably possible.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Be aware that the TVector family of objects has horrible computing performance. I was on a code review for some DUNE pattern recognition code. It was one of the few pieces of HEP code that I have ever seen with a true computing kernel. The author replaced the use of TVector with something else, in about 20 lines of code ( leaving the interface as TVector). After this change the code ran 4 times faster. Previously the code spent all of it's time in the c'tor and d'tor of TObject even though the code in question made no use of the TObject-ness of these objects.
Rob
On Apr 30, 2020, at 7:02 PM, ryuwd notifications@github.com wrote:
uproot_methods can read a small number of ROOT objects like TVector3, which could be written to TTrees and read by uproot instead of XYZVec or Hep3Vector. (https://github.com/scikit-hep/uproot-methods/tree/master/uproot_methods/classes)
(otherwise I agree!)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Roberto, XYZVec is just our typedef for a root native (templated) class, not really a custom object. In general it is much less error prone to directly store objects instead of untyped collections of fundamental types, which need to be translated to/from objects on readback. Can we teach Numpy about these objects? Or maybe request support from the developers?
Dave
On Thu, Apr 30, 2020 at 3:28 PM Stefano Roberto Soleti < notifications@github.com> wrote:
The Python package uproot https://github.com/scikit-hep/uproot/blob/master/README.rst is becoming the de facto standard for converting TTrees into Numpy arrays or Pandas dataframes. Unfortunately, it is not able to read custom objects (and I think PyROOT also have this issue). In particular, I am referring to what happens e.g. in ComboHitDiag (but also other analyzers) where we store positions and directions as XYZVec. In my opinion, we should try to store this type of information as flat objects (for example three floats like pos_x, pos_y and pos_z) or fixed-size arrays (which can be read by uproot) in order to make TTrees as widely accessible as reasonably possible.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Mu2e/Offline/issues/186, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAH576T7MXEJMRFPSVEV5LRPH3QLANCNFSM4MWV6YDA .
-- David Nathan Brown Dave_Brown@lbl.gov Office Phone (510) 486-7261 Fax 495-2957 Lawrence Berkeley National Lab MS 50R5008 (50-6026C) Berkeley, CA 94720
Hi everyone, I am speaking about any ROOT TTree output we want to analyze/read outside of art. I discussed this with the developer of uproot and it doesn't seem possible to easily "teach" uproot how to read these objects. Since for positions and directions the only information we need are really just the values of x, y, and z we could just store them as arrays of fixed size. It is true that, as @ryuwd said, uproot is able to read a limited amount of ROOT classes, but personally I think we should try to store TTrees that resemble tables of numbers as much as possible. This would enable "last-mile" analyses with basically any tool/language, so the user can break free of the ROOT ecosystem (if he wants :) ). Flattening the TTrees also has advantages in terms of speed of vectorized operations with pandas/numpy.
Hi Roberto,
On Apr 30, 2020, at 10:38 PM, Stefano Roberto Soleti notifications@github.com wrote:
Hi everyone, I am speaking about any ROOT TTree output we want to analyze/read outside of art.
All of our art files are stored with the Event TTree maximally split, which means that every float/int/double etc is it's own leaf. How does this differ from what you are asking for? I don't know about the structure of TrkAna and Stnutple files.
Which of these have you looked at?
One of our planned projects is to understand if changing to a less-than-maximallly split file would improve IO performance enough to be interesting.
I discussed this with the developer of uproot and it doesn't seem possible to easily "teach" uproot how to read these objects. Since for positions and directions the only information we need are really just the values of x, y, and z we could just store them as arrays of fixed size. It is true that, as @ryuwd said, uproot is able to read a limited amount of ROOT classes, but personally I think we should try to store TTrees that resemble tables of numbers as much as possible.
This would enable "last-mile" analyses with basically any tool/language, so the user can break free of the ROOT ecosystem (if he wants :) ). Flattening the TTrees also has advantages in terms of speed of vectorized operations with pandas/numpy.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
It is true that they are maximally split, but for example ComboHitDiag
stores XYZVec objects, which can't be read by uproot. My proposal is to try to store information with fundamental types as much as reasonably possible. I think in the case of the positions and directions this is reasonable.
Hi Roberto, There are several design issues here. What you are asking for is a translation stage. Translation is error prone and should be avoided if possible. Second, objects were invented for a reason, namely to keep related info together and provide methods that make sense on the ensemble (like R(), magnitude of a vector). We give that up if we flatten content to a list of floats. Finally, variable length branches are essential for some things, like info about individual hits on a track, which can’t be flattened (can uproot or numpy handle that?).
I am sympathetic to making our data accessible to as many tools as possible, but we shouldn’t give up important design considerations. In my opinion being able to add support for classes we need should be a requirement of any analysis tool we decide to support. If core developers don’t want to do this for us, maybe someone in Mu2e needs to work on it. If the core design of uproot or Numpy is such that adding new object support is intrinsically impossible or very difficult, maybe we should look for a different tool.
Dave
On Thu, Apr 30, 2020 at 21:19 Stefano Roberto Soleti < notifications@github.com> wrote:
It is true that they are maximally split, but for example ComboHitDiag stores XYZVec objects, which can't be read by uproot. My proposal is to try to store information with fundamental types as much as reasonably possible. I think in the case of the positions and directions this is reasonable.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/Mu2e/Offline/issues/186#issuecomment-622240393, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAH572A3KICWPOXKC4BOJ3RPJEU3ANCNFSM4MWV6YDA .
-- David Nathan Brown Dave_Brown@lbl.gov Office Phone (510) 486-7261 Fax 495-2957 Lawrence Berkeley National Lab MS 50R5008 (50-6026C) Berkeley, CA 94720
Hi Dave,
I think there are two issues here being conflated:
ComboHitDiag
included) this is not adding any extra feature and it's making the TTree bloated and essentially ROOT-dependant. For example you mention R()
, but do we really need the magnitude of the position or direction vectors? In my opinion the ntuple should be as close to a collection of numbers as possible and shouldn't necessitate a class dictionary to be read.Finally, variable length branches are essential for some things, like info about individual hits on a track, which can’t be flattened (can uproot or numpy handle that?).
In response to this, maybe I can offer an example of how I have been analysing a ROOT Tree with uproot
and pandas
. I use a Tree in an Analyzer I wrote to produce diag plots for nofield cosmic tracking, and (unrelated) set up an alignment iteration. I think it is in line with the style Roberto is proposing.
The Python package uproot is becoming the de facto standard for converting TTrees into Numpy arrays or Pandas dataframes. Unfortunately, it is not able to read custom objects (and I think PyROOT also have this issue). In particular, I am referring to what happens e.g. in
ComboHitDiag
(but also other analyzers) where we store positions and directions asXYZVec
. In my opinion, we should try to store this type of information as flat objects (for example three floats likepos_x
,pos_y
andpos_z
) or fixed-size arrays (which can be read by uproot) in order to make TTrees as widely accessible as reasonably possible.Edit: in the past I opened an issue on the uproot github where the author confirmed that there is no easy way to read these objects https://github.com/scikit-hep/uproot/issues/418.