RNO-G / mattak

RNO-G dataformats
1 stars 1 forks source link

Refactor dataset._iterate from uproot backend for better readablity #8

Closed fschlueter closed 1 year ago

fschlueter commented 1 year ago

Hey @cozzyd, this PR should not change the functionality nor improve performance (should be ~ unchanged), its only for readability because I had a really hard time understanding the old function. I tested the new iterate by comparing the output with the old iterator for 1 file and different values for max_in_mem (only checked the event numbers). Is that sufficient? Did I understand the idea of the original function correct?

cozzyd commented 1 year ago

this also clobbers whatever the entries used to be, which could cause surprising behavior if someone is mixing iterator with the other way to access, or nesting iterates. For example, the following I believe would behave differently now (and differently between uproot and pyroot backends):

d.setEntries(0)

for info,wf in d.iterate():
  wf-=d.wfs()  # before this change, this would have subtracted the first entry, now it will subtract whatever the current batch is
  ... 

I think you can just set the entries back to what they used to be after calling wfs() / eventInfo() in the loop.

fschlueter commented 1 year ago

I run the following code on the main and feature branch:

d.setEntries(0)

for ei, wf  in d.iterate():
    print(np.mean(wf - d.wfs()))

The result is different. But honestly I do not know why. Also I do not understand why we get more than 1 event, I always interpreted d.setEntries(0) to also affect iterate()...

Does this has to do with the following line in the prev function: self.setEntries(preserve_entries) # hide that we're modifying these Ahh, I think I am starting to understand. Well it is already awkward that pyroot does not use the max_in_mem

fschlueter commented 1 year ago

Okay, I think I have fixed it.