RNO-G / mattak

RNO-G dataformats
1 stars 1 forks source link

Add option to select channels #43

Closed fschlueter closed 3 months ago

fschlueter commented 4 months ago

This is not ready yet but you can have a look at the idea and my pedantic implementation. I am not sure if it can be change such that only data from selected channels are read out of the root file. And thinking further, no idea how to implement it in the c++/pyroot backend.

fschlueter commented 4 months ago

I changed the base to a PR branch such that the diff is easier to inspect.

fschlueter commented 3 months ago

@cozzyd ping

fschlueter commented 3 months ago

I can implement the comments easily, I not sure about the c++ implementation though - I might give it a try

cozzyd commented 3 months ago

I looked into making CalibratedWaveforms lazy.

The problem I foresee is that we probably want to force calibration before any potential serialization (writing the CalibratedWaveforms to a file or tree), otherwise we have to serialialize the calibration object as well (which you obviously don't want to do with every event) or figure out some other way to reconstitute it should you read out a CalibratedWaveforms that has only been partially calibrated.

But I suppose that's fine, if you want to write it out, you might as well calibrate every channel. The other issue is you either have to copy the data in, or have the lifetime of raw waveform outlive the the calibrated waveform. I think copying is the easier thing, but a bit annoying (increases memory footprint of a calibrated waveform by 25% in the case where all channels are calibrated). I guess we could drop the raw waveform once the waveform is calibrated though, and then it should be a net savings if each channel is allocated on demand (as opposed to upfront), but then we lose contiguity between channels. We could also store the raw waveform within the footprint of the calibrated waveform, then copy it to a temporary space and then replace it with the calibrated version upon calibration, which won't use up any more space than the existed CalibratedWaveform but is a bit of gymnastics.

Because C++ doesn't have the equivalent of the property decorator, it would also break the API of CalibratedWaveforms, but I'm not sure if there are any users (and it would be detected at compile time...).

A simpler alternative would be to write a new, unserializable LazyCalibratedWaveforms (which could be converted to a CalibratedWaveform on demand) and use that in the Dataset. I think that might actually be the best option...

fschlueter commented 3 months ago

I looked into making CalibratedWaveforms lazy.

The problem I foresee is that we probably want to force calibration before any potential serialization (writing the CalibratedWaveforms to a file or tree), otherwise we have to serialialize the calibration object as well (which you obviously don't want to do with every event) or figure out some other way to reconstitute it should you read out a CalibratedWaveforms that has only been partially calibrated.

But I suppose that's fine, if you want to write it out, you might as well calibrate every channel. The other issue is you either have to copy the data in, or have the lifetime of raw waveform outlive the the calibrated waveform. I think copying is the easier thing, but a bit annoying (increases memory footprint of a calibrated waveform by 25% in the case where all channels are calibrated). I guess we could drop the raw waveform once the waveform is calibrated though, and then it should be a net savings if each channel is allocated on demand (as opposed to upfront), but then we lose contiguity between channels. We could also store the raw waveform within the footprint of the calibrated waveform, then copy it to a temporary space and then replace it with the calibrated version upon calibration, which won't use up any more space than the existed CalibratedWaveform but is a bit of gymnastics.

Because C++ doesn't have the equivalent of the property decorator, it would also break the API of CalibratedWaveforms, but I'm not sure if there are any users (and it would be detected at compile time...).

A simpler alternative would be to write a new, unserializable LazyCalibratedWaveforms (which could be converted to a CalibratedWaveform on demand) and use that in the Dataset. I think that might actually be the best option...

I admit I am a bit lost here. What means making CalibratedWavefroms lazy?

fschlueter commented 3 months ago

PS, this PR might be redundant when we merge #45

fschlueter commented 3 months ago

I will close this PR, the changes introduced here are also in #45