cep 2: event structure - Githubissues

kosack commented 1 year ago

Related note: I wonder if we could provide a rope or bowler rule to automatically refactor code (as a first pass?)

Rope in particular mentions this as a feature:

Restructuring (like converting ${a}.f(${b}) to ${b}.g(${a}) where a: type=mymod.A)

RuneDominik commented 1 year ago

Looking at the proposed structure the muon information is missing. In the current stucture it is listed on the same structure-level as the data-levels. What would be the new structure for it?

ccossou commented 1 year ago

I have a few comment/questions:

dl3 combine multiple telescopes, so that datalevel probably can't be at the same level in the data structure than others. If that's true, it would be weird to have dlX not at the same place in the structure.
in the event of an observation block being too big, and if we have to implement a segmented datastructure to store all DL0 information on disk in multiple sub-files, would the old or new structure be better shaped to handle such an evolution?

My uneducated guess would tend to prefer the following structure: dl1.telX.event.waveform

Meaning that the array of event would be directly packed as data level then telescope when it make sense, or for dl3, only have dl3.event.

I don't know if from a code point of view this make sense, but that's how I'd do it.

maxnoe commented 1 year ago

dl3 combine multiple telescopes, so that datalevel probably can't be at the same level in the data structure than others. If that's true, it would be weird to have dlX not at the same place in the structure.

Yes, we probably won't have TelescopeEventContainer.dl3 , but only ArrayEventContainer.dl3

in the event of an observation block being too big, and if we have to implement a segmented datastructure to store all DL0 information on disk in multiple sub-files, would the old or new structure be better shaped to handle such an evolution?

My uneducated guess would tend to prefer the following structure: dl1.telX.event.waveform

That's what we have now and this CEP is proposing to change that.

I will expand the text to better explain, but I think it's the opposite: we will have DL0 event files per telescope, so we might want to process DL0 to DL1 for telescopes individually, which will be much more natural if we have multiple data levels inside one TelescopeEvent (to perform the dl0 to dl1 transformation on one telescope) than having to create ArrayEvents for each Telescope with only one telescope in each of the data levels.

It is also much more natural then to merge multiple TelescopeEvent s into one ArrayEvent.

By the way: this is also how sim_telarray output is organized:

ArrayEvent[2010](event_id=9610)
     TriggerInformation[2009](event_id=9610)
     TelescopeEvent[2225](telescope_id=25)
         TelescopeEventHeader[2011](telescope_id=25)
         ADCSamples[2013](telescope_id=25)
         PixelTiming[2016](telescope_id=25)
         ImageParameters[2014](telescope_id=25)
         PixelList[2027](telescope_id=25, code=1, kind=selected)
     TelescopeEvent[3225](telescope_id=125)
         TelescopeEventHeader[2011](telescope_id=125)
         ADCSamples[2013](telescope_id=125)
         PixelTiming[2016](telescope_id=125)
         ImageParameters[2014](telescope_id=125)
         PixelList[2027](telescope_id=125, code=1, kind=selected)
     TelescopeEvent[3230](telescope_id=130)
         TelescopeEventHeader[2011](telescope_id=130)
         ADCSamples[2013](telescope_id=130)
         PixelTiming[2016](telescope_id=130)
         ImageParameters[2014](telescope_id=130)
         PixelList[2027](telescope_id=130, code=0, kind=triggered)
         PixelList[2027](telescope_id=130, code=1, kind=selected)

kosack commented 1 year ago

Looking at the proposed structure the muon information is missing. In the current stucture it is listed on the same structure-level as the data-levels. What would be the new structure for it?

Muon info are just DL1 parameters, so would appear under event.tel[tel_id].dl1 However, we might need to have DL1 split by a Map of strings similar to DL2, so that we can support multiple images/cleanings

maxnoe commented 1 year ago

I updated the cep with a couple of before / after code examples and some more text

kosack commented 1 year ago

"Accepted" and start working on the change. Then we record in this CEP some typical changes that are needed to refactor the code to fit the new structure, and every gnarly special case along with its solution.

One question is when to involve other stakeholders, i.e. the LST, NectarCam, CHEC/SSTCam, (and maybe MAGIC?) teams that are using ctapipe already. Should we ask for comments or just go ahead? At the very least, we might want to demo it at a weekly meeting and invite them to join

kosack commented 1 year ago

Oh, one thing I just realized is not mentioned here that could be useful is the canonical way to determine if a telescope has information at a particular data level. Now that there is only one tel_id map, just looking at tel.keys() is not sufficient, so it would be good to show how it should be done.

In the example:

hillas_dicts = {
    tel_id: dl1.parameters.hillas
    for tel_id, dl1 in event.dl1.items()
}

#After:

hillas_dicts = {
    tel_id: tel_event.dl1.parameters.hillas
    for tel_id, tel_event in event.tel.items()
}

the case where there is no dl1 object or no tel_event.dl1.parameters for a given tel isn't explained.

maxnoe commented 1 year ago

How we work at the moment is that all telescopes get all data levels at the same time and we fill invalid values for telescopes that do not fulfill some criteria (e.g. we will have the image parameters, but they might be nan if not enough pixels survived the cleaning).

So the situation you mentioned, that there are dl1 containers for some telescopes but not others cannot really happen. The DL1 container will be there for all telescope events, but some might contain invalid value markers.

The full loop in the hillas reconstructors looks like this now:

        hillas_dict = {
            tel_id: dl1.parameters.hillas
            for tel_id, dl1 in event.dl1.tel.items()
            if all(self.quality_query(parameters=dl1.parameters))
        }

and will look like this after the change:

        hillas_dict = {
            tel_id: tel_event.dl1.parameters.hillas
            for tel_id, tel_event in event.tel.items()
            if all(self.quality_query(parameters=tel_event.dl1.parameters))
        }

Tobychev commented 1 year ago

To restate what I said in inline comment (in violation of CEP 1!), I think this CEP looks pretty clear, and the reasons for carrying out these changes are sound.

One question is when to involve other stakeholders, i.e. the LST, NectarCam, CHEC/SSTCam, (and maybe MAGIC?) teams that are using ctapipe already. Should we ask for comments or just go ahead? At the very least, we might want to demo it at a weekly meeting and invite them to join

I just re-read CEP 1, and one thing I think we've forgotten is announcing this CEP according to the workflow we defined, in particular I think this haven't happened:

an announcement with a link to the pull request should be sent both to the ctapipe mailing list and the ctapipe coordinator list.

so some sort of announcement that will reach more people is already mandated. Separately, I think prodding those "external" stakeholders to leave a review (or at least proof -of-skimming) is valuable, but I'll let @kosack judge if a mailing list announcement is a firm enough poke to get some response.

maxnoe commented 1 year ago

I made one last change (removing trigger from the top-level of both SubarrayEvent and TelescopeEvent.

This is related to #2374, trigger info should be part of DL0.

maxnoe commented 1 year ago

I started implementing a zfits event source following the current draft of the ACADA DPPS ICD, in part as preparation for the ACADA LST integration tests on La Palma.

On thing I realized is that with this proposal here, it would be possible to implement subarray event sources and telescope event sources.

The latter would be useful for reading DL0 telescope event data and processing it up to dl1 in parallel, where we don't need stereo information and to read the anticipated separate streams for shower, calibration and muon events.

E.g. calib pipe would probably only ever use a DL0TelescopeEventSource on calibration and muon stream files and dl1 could processing could also run using that on the shower stream.

A possible SubarrayEventSource would then read the single files and merge the separate telescope data into the SubarrayEvent structure.

kosack commented 1 year ago

Will merge into "proposed" and we will open a new PR moving it to "Accepted" for external review

maxnoe commented 1 year ago

Ok, merging now to have it under proposed rendered in the docs so we can collect comments.

cta-observatory / ctapipe

cep 2: event structure #2304