Should DataContainer() only contain "data"?

calispac commented 6 years ago

For the calibration we need to keep track of the "conditions" in which the data where taken.

Example:

What is the DAC level of each LED
Camera server config
ZfitsWriter config
DigiCam config

To my undestanding what is listed above is going to be part of the .zfits header (correct? @Etienne12345 )

We also need a description of the Camera geometry+hardware components (but this is static in principle i.e. not configurable). We have utils.DigiCam() for that already.

So here I see two options to "store" this information:

Each configurable is a python class, that one would instantiate when reading a file.
All of the information above is stored in the DataContainer() and accessible via: event.stuff

dneise commented 6 years ago

In my opinion everything should be reachable from the event.

calispac commented 6 years ago

Seems to not make sens to merge data and setup in .zfits and then split it in analysis again.

calispac commented 6 years ago

In my opinion everything should be reachable from the event.

In that case should we "allow" containers to not contain Fields() only but any python object?

For instance one could have the current InstrumentContainer() https://github.com/cta-sst-1m/digicampipe/blob/b21e1a1d10dd28cb828f25653921377bb81a6ce2/digicampipe/io/containers.py#L39-L56

be replaced by:

class InstrumentContainer(Container):

    pdp = utlis.Camera() # replace patch, cluster, num_pixels etc.
    optics = CameraOptics() # Not needed at the moment
    cts = CameraTestSetup() # state CTS
    writer = ZfitsWriter() # state of writer
    digicam = DigiCam() # config/state of digicam
    etc....

If the information on the data taking condition is in the header of .zfits should we let the event_stream() fill each of this Container attribute?

dneise commented 6 years ago

I would even drop the entire ctapipe.Container idea completely. It is a wrong way in my opinion

dneise commented 6 years ago

A dict is just fine

calispac commented 6 years ago

I would even drop the entire ctapipe.Container idea completely. It is a wrong way in my opinion

yes maybe too advanced for now. Just python usual data structures is fine

dneise commented 6 years ago

Ah one addittion: I really like namedtuple ... when we want to have a dict-like container, where we somehow know the keys in advance .. like when we read an event, we already know all the keys from the .proto file.

also namedtuples are a little bit "read-only" ... so we can be a bit sure no process is accidentally overwriting the adc-samples.

I like the idea of processors only adding information to the event ... so a processor like "set_pixels_to_zero" would not really overwrite anything, but either it makes a copy and modifies that or it creates just a mask, which tells other processors which pixels should be treated as if there were zeros (or nans or whatever ... missing data).

dneise commented 6 years ago

So ... hmm

If we ditch the whole "Container" concept. this is a major interface change I presume.

dneise commented 6 years ago

This is something one might implement in a "sprint" ... so we merge all open PRs ... we tell everybody who wants to help to come to geneva for 3 days and then we do it.

Etienne12345 commented 6 years ago

Just to be clear: I like the idea and agree with you that the containers might be an overcomplexification. This being said, shouldn't we focus on adding features to the analysis pipeline rather than getting the best possible design for it ? It will be replaced by the official ctapipe at some point, whether we like it or not...

dneise commented 6 years ago

Yes we should not find the best design, just one that works for us.

So either we say the containers hurt us so much, we need to get rid of them (costs quite a lot of time once) ... or we say the containers are complex, but we can live with them (might cost us a little time here and there).

Which version costs less time should be used ... but I find that hard to estimate.

Maybe there is one technical point I should repeat here. We can put into a Container what ever we like ... there is no need to put only "Fields" or "Maps" or whatever they call the stuff into a Container ..

dneise commented 6 years ago

So taking into account what Etienne said. I'd like to make a practical proposal, which does not involve a complete rewrite...

So Cyril said:

So here I see two options to "store" this information:

Each configurable is a python class, that one would instantiate when reading a file.

All of the information above is stored in the DataContainer() and accessible via: event.stuff

I must say I do not understand the difference. But I think I understand 2. and I think 2 is good.

As a technical note: One can put into a DataContainer whatever one wants, it does not have to be a child of Container or of type Map or Field.

I try to give an example using a use-case from above: "What is the DAC level of each LED?".

So I assume it goes like this:

you setup the calibration setup
you take some data

Then in the end you have a fits.fzfile which probably does not contain any information about the LED DAC settings. I just guess you have an additional file ... maybe a text file, which contains the DAC settings and timestamps. So you can always look up what DAC settings were applied when.

Now you want to combine that in your analysis, then I'd propose something like this:

stream = event_stream("your.fits.fz.file")
stream = add_cts_info(stream, "your_cts.txt.file")

for event in stream:
    print(event.dacs)  # --> probably a [1296, 2] array with 2 dac values one for AC one for DC or so .. dunno

maybe it is not event.dacs but somehow deeper inside like event.cts.leds.dacs. The function add_cts_info() should work similar to: https://github.com/cta-sst-1m/digicampipe/blob/9ce56785130d4a521789f526f7f635b9f483bcb7/digicampipe/io/event_stream.py#L36-L60

It looks at the event time .. tries to find the correct LED level file (what ever format it is) opens it and appends the information into the event stream.

calispac commented 6 years ago

In principle now we should have a file zfits file per setup config. So the it should not synchronize exactly like add_slow_data() on a time basis but on zfits file basis.

I am fine with what you propose

cta-sst-1m / digicampipe

Should DataContainer() only contain "data"? #125