cta-sst-1m / digicampipe

DigiCam pipeline based on ctapipe
GNU General Public License v3.0
3 stars 3 forks source link

File headers and current way of reading files #124

Open calispac opened 6 years ago

calispac commented 6 years ago

At the moment we access data from a file this way:

events = event_stream(url='my_file.zfits')
for event in events:
    # do something with event but need header of .zfits file

But lets imagine that I need some information that is in the header of the zfits file (no header existing to this day, but in the future might have some info on the data taking "conditions").

As it is right now I cannot read the header because:

  1. Header is not propagated to the DataContainer() (should it be?)
  2. I cannot use the ZFile() object created inside event_stream().

One solution could be that we require event_stream() to have a Zfile() as argument:

my_zfile = ZFile(url='my_file.zfits')
events = event_stream(my_zfile)

for event in events:
    if my_zfile.header.some_value > 0:
        # do my stuff event
calispac commented 6 years ago

I am also facing this issue with the Monte Carlo file. I have some header that I would like to have a look in the analysis.

I think I will make something equivalent to ZFile() but for Digicamtoy files where the header is a class attribute.

calispac commented 6 years ago

This also rises the question: Should we keep everything in a DataContainer() as we are sort of doing right now (e.g. geometry) even if it does not change per event? Maybe it is good to split :

  1. Geometry (through Camera())
  2. Header/Metadata (with the file readers e.g. ZFile())
  3. The events (with DataContainer())

see #125

dneise commented 6 years ago

I think the event should contain everything which is needed to analyse it. This way one can be sure to have everything one needs when handing an event from one function to the next .. one will never say:

"Oh but this function also needs to know the power consumption of the elevation motor, but ... ohhhh ... it has no access to this information, so now we have to modify the entire design"

I think the event just needs to be a big bucket of all the information, which might be needed to analyze it.

moderski commented 6 years ago

I am not familiar with all the structures you are talking about :-), but " the event should contain everything which is needed to analyze it" looks like a large overkill. Especially that some data are not generated by the camera, nor even the telescope (e.g. ambient temperature, humidity, cloudiness), and some are very slowly changing (flat-fielding corrections, pointing corrections, mirror reflectivity, etc.). In this aspect, I like the "split" idea of Cyril quite attractive. This will allow something like:

my_zfile = ZFile(url='my_file.zfits')
# load geometry and calibration data based on my_file.header
events = event_stream(my_zfile)

for event in events:
    # do my stuff with event
dneise commented 6 years ago

I think I have a problem understanding the example of Cyril ... let me repeat it here and explain how I don't get it. This was the original example.

my_zfile = ZFile(url='my_file.zfits')
events = event_stream(my_zfile)

for event in events:
    if my_zfile.header.some_value > 0:
        # do my stuff event

What typically happens in our code is, that the stream called events here is given to a function ... like this:

my_zfile = ZFile(url='my_file.zfits')
events = event_stream(my_zfile)

events = calculate_baseline(events)

This function only gets the events .. it does not see my_zfile so this does not work:

#inside calculate_baseline.py
def calculate_baseline(events):
    for event in events:
        if my_zfile.header.some_value > 0:
           #^^^------------> BOOOOM at this point: Symbol `my_zfile` is not known in this scope!
            # do my stuff event

So therefore, I was proposing to append virtually everything at the events, which are in the events stream, since then we do not have to touch every function which might do an analysis.