AIDASoft / podio

PODIO
GNU General Public License v3.0
23 stars 57 forks source link

Invalid data for collections from a frame indexed in the same line in python #642

Open jmcarcell opened 1 month ago

jmcarcell commented 1 month ago

The issue is the following, when indexing a frame iterator and using it in the same line:

import podio
podio_reader = podio.root_io.Reader("example_frame.root")
frames = podio_reader.get("events")
print(len(frames[0].get('mcparticles')))  # prints a random number like 811654108391827

the number printed is consistent with reading some random memory (64 bits). Or if the number is too big, the error will be:

OverflowError: cannot fit 'int' into an index-sized integer

The workaround is easy, just save the intermediate frame (explaining why this has not been before since typically one will loop over the frames):

ev = frames[0]
print(len(ev.get('mcparticles')))  # prints 10, all is good here

From this it just seems that frames[0] is not constructed properly in python. Any other operation that I have tried will also fail.

It seems not to be an issue with podio (leaving only cppyy), at least not with the python code, because if we get down to the functions that are actually being called and do:

print(podio.Frame(cppyy.gbl.std.move(podio_reader._reader.readEntry("events", 0))).get('mcparticles').size())

this will also print a random number. I'll try to prepare an example to report this to cppyy if that is the case (looks like).

tmadlener commented 1 month ago

Does this work on the c++ side? In principle it should as everything should live until the end of the statement. But in this case we create a temporary frame, from which we get a const & to a collection on which we then call size, so I am not entirely sure.

jmcarcell commented 1 month ago

Yes, on C++ there aren't any surprises and you can do everything in one line. But here frames[0] not being alive before the line ends seems unexpected.

wlav commented 1 month ago

C++ and Python have different rules for temporaries: C++ guarantees that the lifetime of temporaries is at least until the end of the statement; in Python, temporaries go away whenever the reference count returns to zero, which may be in the middle of a long statement. If a C++ method returns something by reference to data that lives in a temporary object (that came to be from a different function call, say, in the long statement), then that data may go away before the end of life of the reference.

cppyy tries to set "lifelines" from such references onto the object that holds the data by checking whether the reference points to somewhere within the object on which the method was called. This automatic system would fail if the reference is to something that is not part of the temporary, but is still controlled by that temporary. E.g. if the reference points into data that is held by pointer by the object.