HEPData / hepdata_lib

Library for getting your data into HEPData
https://hepdata-lib.readthedocs.io
MIT License
15 stars 37 forks source link

Retrieve histogram from stack via `RootFileReader.retrieve_object` #261

Closed IzaakWN closed 3 months ago

IzaakWN commented 3 months ago

It would be nice to have a short one-liner for retrieving a TH1 histogram that is inside the list of a THStack stack via RootFileReader.retrieve_object or RootFileReader.read_hist_1d.

This is easy to implement by adding a couple of lines in RootFileReader.retrieve_object:

    obj = self.tfile.Get(parts[0])
    for part in parts[1:]:
        if isinstance(obj,r.THStack):
            return obj.GetHists().FindObject(part)
        else:
            obj = obj.GetPrimitive(part)

https://github.com/IzaakWN/hepdata_lib/blob/513f132d289c27110f9f80173f661690eaea10c1/hepdata_lib/root_utils.py#L101-L104

The user can then retrieve the histogram as

reader = RootFileReader("figure.root")
points = reader.read_hist_1d("canvas/mainpad/stackname/histname")
hist   = reader.retrieve_object("canvas/mainpad/stackname/histname")

📚 Documentation preview 📚: https://hepdata-lib--261.org.readthedocs.build/en/261/

clelange commented 3 months ago

Thanks for this, I agree that this functionality would be useful.

Do you think you could come up with a small test for it? This would make sure that future changes won't break it. The rough idea would be to create a THStack and then try to read from it. What would happen, if the obj.GetHists().FindObject(part) fails to find the histogram?

IzaakWN commented 3 months ago

Hi @clelange, thanks for the feedback!

Tester

Do you think you could come up with a small test for it? This would make sure that future changes won't break it.

Added a separate test method to tests/test_rootfilereader.py. I checked it correctly retrieves a nonzero histogram from the stack in the ROOT file as expected, and the following works with any errors:

from tests.test_rootfilereader import TestRootFileReader
TestRootFileReader().test_retrieve_object_stack()

Error handling

What would happen, if the obj.GetHists().FindObject(part) fails to find the histogram?

I checked that an empty stack

from ROOT import TH1F, THStack
stack = THStack('stack','stack')
print(stack.GetHists()) # prints <cppyy.gbl.TList object at 0x0>
stack.GetHists().FindObject("ghost") # throws TypeError

throws

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: none of the 2 overloaded methods succeeded. Full details:
  attempt to access a null-pointer
  TObject* TList::FindObject(const TObject* obj) =>
    TypeError: could not convert argument 1

Although this is an unlikely use case. For a nonempty stack, retrieving a nonexistent histogram returns an empty TObject, i.e. None:

hist1 = TH1F('hist1','hist1',10,0,10)
stack.Add(hist1)
print(stack.GetHists().FindObject("ghost")) # prints <cppyy.gbl.TObject object at 0x0>
print(stack.GetHists().FindObject("ghost")==None) # prints True

Therefore, I changed the implementation to the following, such that an assertion error is thrown by the assert obj below if the user passes an invalid histogram path:

    obj = self.tfile.Get(parts[0])
    for part in parts[1:]:
        if isinstance(obj,r.THStack):
            obj = obj.GetHists().FindObject(part)
        else:
            obj = obj.GetPrimitive(part)

    assert obj

    return obj

https://github.com/IzaakWN/hepdata_lib/blob/cd93d57753b11642a30b66e4789c57374ad34e2b/hepdata_lib/root_utils.py#L98-L108

Another possible issue here is that if the user passes a path that is longer (e.g. "canvas/mainpad/stackname/histname1/histname2"), which will cause obj.GetPrimitive() to throw an error in the next iteration, although that's the same in the use case without the stack anyways (i.e. "canvas/mainpad/histname1/histname2").