Closed fschlueter closed 1 year ago
Yeah, this is a good feature to have!
Strange that your benchmark has the opposite behavior from what I have seen (though maybe I was comparing PyROOT vs the current NuRadioReco implementation rather than the two mattak implementations).
It's a bit hard to make uproot faster here because it's inefficient to get arbitrary slices of events. In the case where the trigger is rare you'll win by getting events one at a time on demand, but if it's not rare, then batching the reading probably helps. I guess we can come up with a heuristic for that.
Okay, I realise one can also only modify a few line in Dataset.py and does not have to touch the backends:
if trigger is not None:
for info, traces in self._iterate(start,stop,calibrated,max_entries_in_mem):
if info.triggerType == trigger:
yield info, traces
else:
yield from self._iterate(start,stop,calibrated,max_entries_in_mem)
But than actually the pyroot version does not get faster..
yeah, because the waveform loading is done even if you don't need it. I think the more invasive change is better. However, I can imagine another similar cut would be to cut on the trigger time, which would enable selecting cal pulsers that are synchronized to GPS. A more generic interface might be a callback on the eventInfo to decide whether or not to read the event?
For example, the interface can be
def iterate(self, start : int = 0, stop : Union[int,None] = None, calibrated: bool = False, max_entries_in_mem : int = 256, selector = Optional[Callable[EventInfo,bool]]) -> Generator[Optional[Tuple[EventInfo, numpy.ndarray]],None,None]:
and you could then do something like
d.iterate ( selector = lambda info : info.triggerType == 'Force' )
uh I agree, much nicer:)
here we go. Only minimal changes. Unfortunately I had trimming trailing whitespace activated... So it might be a bit cumbersome to see the changes... sorry
Feel free to merge when ready! Don't worry about the trailing spaces... I don't have my vim configured to remove them but I probably should.
Hi @cozzyd,
do you like this kind of feature? I think it would be useful. I am not sure if my implementation is the most elegant and efficient.
However I ran a little speed test with the following code:
On the main branch pyroot took 55s and uproot 29s. On this branch it took with pyroot 30s and with uproot 29s. (Of course with the main branch you actually access more events). (The reason why uproot does get faster is also pretty obvious, maybe we can improve it)...