cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
64 stars 268 forks source link

Migration of flatfield and pedestal components from lstchain #2519

Closed TjarkMiener closed 7 months ago

TjarkMiener commented 7 months ago

This PR draft adds a migration proposal for the flatfield and pedestal components from lstchain into ctapipe. Shared functionality is moved into a parent component, while the different subcomponents hold the _process() function for the conventional processing of flatfield and pedestal events. Additional subcomponents with complex processing routines of flatfield and pedestal events can be added in the future.

maxnoe commented 7 months ago

Hi @TjarkMiener thanks, this is much needed.

However, I think we should first discuss a bit how we want the general API of these classes to look like.

I think we certainly don't want to keep the current one.

There are a couple of related open issues:

FrancaCassol commented 7 months ago

Hi @TjarkMiener thanks, this is much needed.

However, I think we should first discuss a bit how we want the general API of these classes to look like.

Hi @maxnoe,

I think for the moment @TjarkMiener just plans to optimised the present code. However, I agree on the points you raised, it is probably worth to develop a more general view before starting.

maxnoe commented 7 months ago

@FrancaCassol I think it doesn't make much sense to apply any optimization to the existing code, as we know we need to replace it completely relatively soon.

maxnoe commented 7 months ago

The logic is already somewhat different to what is implemented in LST currently, so clearly this at least has some new development.

TjarkMiener commented 7 months ago

This PR draft was meant to refactor the current lstchain components for flatfield and pedestal to remove repeated code and start a discussion on the migration. In addition, I added this config parameter, which allows the calculation of the stats after a given iteration and not once the sample buffer is filled. Originally I had in mind to follow what @kosack suggested (pre-process/integrate charges and arrival times, dump dl1-like events, and then table read plus the calculation of coefficients). I dropped this idea because a presumed 2x 100Hz calibration rate and an obs block duration of 20-25 mins would exceed the RAM limitation of 2GB per job when reading the whole table. However, as we discussed in a separate thread, the calibration rate will be likely reduced in the near future. So I guess we could reconsider the suggestion of integrating and temporarily storing dl1-like calibration data to read them afterward using the TableLoader and apply functions to all events at once.

The idea I followed in this PR for the design of the components was to be able to flexibly add subcomponents that could host the "fancy algorithms" via the _process() function. You could in principle increase the sample_size and then apply the complex functions on the self.charges and self.arrival_times arrays as you would do for all the events read with the TableLoader. Those complex functions could even be applied on a pixel-wise level, i.e. we might want to treat pixels that are affected by stars differently. The current subcomponents FlatFieldExtractor and PedestalExtractor were targeting the cat-B processing, while fancy subcomponents can be added in the future for cat-C processing.

I like the idea proposed in #1226 for the on-the-fly computation of flatfield and pedestals, even though we might want to store the charge statistics monitoring of the calibration events for the data quality monitoring.

maxnoe commented 7 months ago

The table loader also supports chunked loading, which allows you to iterate through chunks and keeping consecutive chunks in memory, you only need roughly the amount of events of your windows sizes.

TjarkMiener commented 7 months ago

True, good point! My initial plan was to follow a workflow like: integrating charges and extracting arrival times via the process tool ($ ctapipe-process ... --write_images --no-write_parameters). And then code a tool for the calculation of the camera calibration coefficients based on the TableLoader. Therefore, I would need to make the process tool compatible with calibration events, i.e. allowing two gains integration. By using the process tool we would directly ensure to follow the standard ctapipe DL1 data model, which then probably needs some adjustments for the allowance of two gains images and peak_times. I think the dl1-like files for the calibration events can be stored temporarily. One advantage would also be that those dl1-like calibration files could be sent directly to the off-site data center after production rather than the waveforms of the calibration events. A naive guess from my side would be that we don't want to repeat the charge integration and arrival time extraction for the calibration events each time we are calculating camera calibration coefficients (on- or off-site).

I'm closing this draft for now and working on the suggestion above in a separate branch. In case you think we should follow the design of this PR feel free to comment and I will reopen this draft. Thanks for the discussion and suggestion!