cta-observatory / cta-lstchain

LST prototype testbench chain
https://cta-observatory.github.io/cta-lstchain/
BSD 3-Clause "New" or "Revised" License
24 stars 77 forks source link

Which data format for interleaved calibration data in DL1? #337

Open FrancaCassol opened 4 years ago

FrancaCassol commented 4 years ago

Hi,

I am finalising the code to calculate statistical distributions and calibration values from interleaved flat-field and pedestal events. The point now is to decide where to put the results in the DL1 file.

We can imaging to add three new tables (for pedestal, flat field and calibration separately). Then, we add to each shower event the calibration event number used to calibrate it (which will permit to find the employed calibration coefficients in the calibration table). The same, each calibration event will contain the number of the flat-field and pedestal event used for its calculation. The first row of all three tables will contain the calibration values coming from the calibration run at the beginning of the night, which will be used for the first shower events (when not enough statistics from interleaved events will be cumulated yet) or for all the events, in the case of absence of interleaved event analysis (as we are doing now). This will permit to have in any case the employed calibration coefficients inside the DL1 file.

Comments, suggestions?

jsitarek commented 4 years ago

Hi,

I guess you mean that each shower event will have 3 numbers, one for each table, right? Note that you need a different number of events for redoing F-factor calibration of the conversion factors and different (lower) for pedestal calculation (and also those are different events, so even if you have the same frequency of pedestal and calibration events it might happen that they will get out of synchc from time to time. This is also important for the pedestal table, because it should have the pedestal bias and RMS already converted to phe. But it might happen that in the middle of the pedestal update you change the conversion factors.

Last thing, what do you do between runs? If you switch to another source then of course the procedure should start from the scratch, but if you continue with the same source and just change run you could in principle continue with the updates. Still I think it is safer to just do the calibration independetly for each run, otherwise you would have to check that the position of the telescope did not change (even wobbling would be a problem) and that the time passed is short

FrancaCassol commented 4 years ago

Hi @julian,

I guess you mean that each shower event will have 3 numbers, one for each table, right?

Actually, I was thinking to associate to each shower event only the number of the calibration event (which contains the gain and pedestal values) used to calibrate it and inside the calibration event keep the event number of the pedestal and ff events used for calculating the calibrations (the will be obviously also the mean time and the time range of the events used for getting the statistical distributions)

Note that you need a different number of events for redoing F-factor calibration of the conversion factors and different (lower) for pedestal calculation (and also those are different events, so even if you have the same frequency of pedestal and calibration events it might happen that they will get out of synchc from time to time.

Indeed, in such a way the calibration, ff and pedestal tables can go without problem out of sync

This is also important for the pedestal table, because it should have the pedestal bias and RMS already converted to phe. But it might happen that in the middle of the pedestal update you change the conversion factors.

Ok, I didn't think of that, in this case I would suggest to add to the pedestal and flatfield table the number of the calibration event closest in time. That will permit to extract the calibrated values when needed but also to easily calculate new calibration values from the raw pedestal and ff (Or do you think it is worth to directly write the calibrated ped and ff?)

By the way for the moment we keep the same statistics for ff and ped, which difference in statistics do you use in Magic?

Last thing, what do you do between runs? If you switch to another source then of course the procedure should start from the scratch, but if you continue with the same source and just change run you could in principle continue with the updates. Still I think it is safer to just do the calibration independetly for each run, otherwise you would have to check that the position of the telescope did not change (even wobbling would be a problem) and that the time passed is short

Yes, I was thinking to start from scratch at each run, this is simpler and I don't see major drawbacks

jsitarek commented 4 years ago

I would personally prefer that the pedestal tables are already calibrated into phe, because this is what will be needed later on in the chain, but keep the number of the calibration that was used so one can always undo or redo the calibration if needed.

In MAGIC I use 500 events for pedestal estimate and 1000 events for calibration. The reason is that I want pedestals updated as soon as possible, and for calibration I need a precise estimate of the RMS to get the F-factor calibration right, and this needs a lot of statistics. SInce LST is taking interleaveds at larger frequency if you make it 1000 and 2000 events you will get them updated with the same frequency as I do in MAGIC and you will get the numbers somewhat more precise.

FrancaCassol commented 4 years ago

I would personally prefer that the pedestal tables are already calibrated into phe, because this is what will be needed later on in the chain, but keep the number of the calibration that was used so one can always undo or redo the calibration if needed.

Ok. Let's do like that.

In MAGIC I use 500 events for pedestal estimate and 1000 events for calibration. The reason is that I want pedestals updated as soon as possible, and for calibration I need a precise estimate of the RMS to get the F-factor calibration right, and this needs a lot of statistics. SInce LST is taking interleaveds at larger frequency if you make it 1000 and 2000 events you will get them updated with the same frequency as I do in MAGIC and you will get the numbers somewhat more precise.

Yes, in order to have the same frequency for ped and ff frequency we could also ask to take interleaved events with double frequency for ff. Since CTA asks at least 100 Hz for calibration events, we could think to use 100 Hz for pedestals and 200 Hz for FF. To be discussed.

jsitarek commented 4 years ago

One can take them with different frequency, but you will still get them slightly out of sync at the beginning/end of the update, just because those are different types of events and are taken independently (some events can fall into the deadtime of the readout, laser might not start shooting immediatelly, etc.)

kosack commented 4 years ago

Are you talking about calibration Monitoring data here? E.g. pedestals or ff coeffs that are calculated every few thousand events? Or instantaneous values? If the former, it should be tables in /dl1/monitoring/telescope/[...] , probably following the monitoring table data model from ACADA, or at least containing some standard columns like an id (to link to events), start and end times of computation, number of events used to (might need to be a vector per pixel), parameters per pixel (e.g. pedestal, pedestal_variance, ...). There should be a common data model for all tables that store monitoring data.

This is something that really needs to be defined by (or with) CTAO. Otherwise, be ready to change it if there are standards developed later.

FrancaCassol commented 4 years ago

Hi @kosack,

Are you talking about calibration Monitoring data here? E.g. pedestals or ff coeffs that are calculated every few thousand events?

yes, I am talking of this ones

If the former, it should be tables in /dl1/monitoring/telescope/[...] , probably following the monitoring table data model from ACADA,

Could you please send a link to the document that describes it (or the contact person which is working on that)?

or at least containing some standard columns like an id (to link to events), start and end times of computation, number of events used to (might need to be a vector per pixel), parameters per pixel (e.g. pedestal, pedestal_variance, ...).

Yes, this is what I have in mind

There should be a common data model for all tables that store monitoring data. This is something that really needs to be defined by (or with) CTAO. Otherwise, be ready to change it if there are standards developed later.

I totally agree and it would be obviously better to develop now the standards ;-) But since I am afraid that the time scales are different, we can start with a first "well thought" model in LST, which can then help to finalise the standards later, what do you think?