Open calispac opened 6 years ago
In my opinion everything should be reachable from the event.
Seems to not make sens to merge data and setup in .zfits
and then split it in analysis again.
In my opinion everything should be reachable from the event.
In that case should we "allow" containers to not contain Fields()
only but any python object?
For instance one could have the current InstrumentContainer()
https://github.com/cta-sst-1m/digicampipe/blob/b21e1a1d10dd28cb828f25653921377bb81a6ce2/digicampipe/io/containers.py#L39-L56
be replaced by:
class InstrumentContainer(Container):
pdp = utlis.Camera() # replace patch, cluster, num_pixels etc.
optics = CameraOptics() # Not needed at the moment
cts = CameraTestSetup() # state CTS
writer = ZfitsWriter() # state of writer
digicam = DigiCam() # config/state of digicam
etc....
If the information on the data taking condition is in the header of .zfits
should we let the event_stream()
fill each of this Container attribute?
I would even drop the entire ctapipe.Container idea completely. It is a wrong way in my opinion
A dict is just fine
I would even drop the entire ctapipe.Container idea completely. It is a wrong way in my opinion
yes maybe too advanced for now. Just python usual data structures is fine
Ah one addittion: I really like namedtuple
... when we want to have a dict-like container, where we somehow know the keys in advance .. like when we read an event, we already know all the keys from the .proto file.
also namedtuples are a little bit "read-only" ... so we can be a bit sure no process is accidentally overwriting the adc-samples.
I like the idea of processors only adding information to the event ... so a processor like "set_pixels_to_zero" would not really overwrite anything, but either it makes a copy and modifies that or it creates just a mask, which tells other processors which pixels should be treated as if there were zeros (or nans or whatever ... missing data).
So ... hmm
If we ditch the whole "Container" concept. this is a major interface change I presume.
This is something one might implement in a "sprint" ... so we merge all open PRs ... we tell everybody who wants to help to come to geneva for 3 days and then we do it.
Just to be clear: I like the idea and agree with you that the containers might be an overcomplexification. This being said, shouldn't we focus on adding features to the analysis pipeline rather than getting the best possible design for it ? It will be replaced by the official ctapipe at some point, whether we like it or not...
Yes we should not find the best design, just one that works for us.
So either we say the containers hurt us so much, we need to get rid of them (costs quite a lot of time once) ... or we say the containers are complex, but we can live with them (might cost us a little time here and there).
Which version costs less time should be used ... but I find that hard to estimate.
Maybe there is one technical point I should repeat here. We can put into a Container what ever we like ... there is no need to put only "Fields" or "Maps" or whatever they call the stuff into a Container ..
So taking into account what Etienne said. I'd like to make a practical proposal, which does not involve a complete rewrite...
So Cyril said:
So here I see two options to "store" this information:
- Each configurable is a python class, that one would instantiate when reading a file.
- All of the information above is stored in the DataContainer() and accessible via:
event.stuff
I must say I do not understand the difference. But I think I understand 2. and I think 2 is good.
As a technical note: One can put into a DataContainer whatever one wants, it does not have to be a child of Container
or of type Map
or Field
.
I try to give an example using a use-case from above: "What is the DAC level of each LED?".
So I assume it goes like this:
Then in the end you have a fits.fz
file which probably does not contain any information about the LED DAC settings. I just guess you have an additional file ... maybe a text file, which contains the DAC settings and timestamps. So you can always look up what DAC settings were applied when.
Now you want to combine that in your analysis, then I'd propose something like this:
stream = event_stream("your.fits.fz.file")
stream = add_cts_info(stream, "your_cts.txt.file")
for event in stream:
print(event.dacs) # --> probably a [1296, 2] array with 2 dac values one for AC one for DC or so .. dunno
maybe it is not event.dacs
but somehow deeper inside like event.cts.leds.dacs
. The function add_cts_info()
should work similar to:
https://github.com/cta-sst-1m/digicampipe/blob/9ce56785130d4a521789f526f7f635b9f483bcb7/digicampipe/io/event_stream.py#L36-L60
It looks at the event time .. tries to find the correct LED level file (what ever format it is) opens it and appends the information into the event stream.
In principle now we should have a file zfits
file per setup config.
So the it should not synchronize exactly like add_slow_data()
on a time basis but on zfits
file basis.
I am fine with what you propose
For the calibration we need to keep track of the "conditions" in which the data where taken.
Example:
To my undestanding what is listed above is going to be part of the
.zfits
header (correct? @Etienne12345 )We also need a description of the Camera geometry+hardware components (but this is static in principle i.e. not configurable). We have
utils.DigiCam()
for that already.So here I see two options to "store" this information:
event.stuff