WIP: Large scale Refactoring

Hckjs commented 4 months ago

WIP: It is the first very very draft on refactoring the whole lstchain code to

Move to the standard ctapipe data format (by preferably using the DataWriter)
Switch to the ctapipe config system
Rewrite and untangle the spaghettis by using/importing as much as possible from ctapipe - also preparing to integrate code step by step into ctapipe.

The first idea is to implement a LSTProcessor-Tool analogous to ctapipe's Processor Tool with LST-specific Components inheriting as much as possible from ctapipe classes. The aim here would be to mainly substitute the r0_to_dl1, dl1_ab and dl1_to_dl2 scripts to have just one tool that you can feed with different configs regarding your desired analysis step. The new analysis flow could look like:

1) R0 to DL1

lstchain_process --input r0_events.fits.fz --config config.yaml --output events.dl1.h5

Processing R0 data up to dl1 with Cat-A calibrations all defined in a base config.yaml. It can optionally write out muons and interleaved events (maybe also allow to only write out the interleaved pedestal events directly for Cat-B calibrations)

2) Reprocess DL1 (DL1ab)

lstchain_process --input events_cat_A.dl1.h5 --cat-b-calibrations cat_B-calibrations.h5  --config config.yaml config_dlab.yaml --output events_cat_B.dl1.h5

Not only for reprocessing dl1 data (e.g. with different cleaning settings...), but also for applying Cat-B calibrations and including pedestal cleaning. All defined in an additional config_dl1ab.yaml.

3) DL1 to DL2

lstchain_process --input Cat_B.dl1.h5 --config config_dl2.yaml --output events.dl2.h5

Process DL1 to DL2 with a specific (src (in)dependet) config_dl2.yaml.

Or directly from R0 to DL2

lstchain_process --input r0_events.fits.fz --cat-b-calibrations cat_b_calibrations.h5 --config config.yaml --output events.dl2.h5

To process directly from R0 to DL2 (with Cat-B calibs) you can first write out the interleaved events with the 'interleaved-only'-mode and compute the cat-B calibrations with the separate tool. After that you hand in your config, the cat-B calibrations and RF-Models (maybe trained by ctapipe's ML module - ctapipe-train-...) to the processor tool and process directly to DL2.

MC processing should be analogous to the above scheme.

Tasks

The tool would basically look like ctapipe's Processor tool:
```
for event in source:

if event.trigger.event_type in {
    EventType.FLATFIELD,
    EventType.SKY_PEDESTAL,
}:
    self.process_interleaved(event)

if self.should_tune_waveforms:
    self.tune_waveforms(event)

if self.should_calibrate:
    self.calibrate(event)

if self.should_compute_dl1:
    self.process_images(event)

if self.should_compute_muon_parameters:
    self.process_muons(event)

if self.should_compute_lhfit_parameters:
    self.process_lhfit(event)

if self.should_compute_dl2:
    self.process_shower(event)

if not self.write_interleaved_only:
    self.write(event)
```
with different Components passing and filling the ArrayEventContainer. That includes:
- [ ] LSTInterleavedProcessor component which process the interleaved pedestal and flatfield events with the CalibrationCalculator. There should also be the possibility of writing only the r1 waveforms to an extra .h5 file to calculate the cat-B coefficients. For that it would be also nice to add a flag in ctapipe_io_lst for the gain selection of calibration events, so we don't have to calibrate at this level.
- [ ] WaveformNSBTuner component to calculate (based on a target file) and apply NSB on waveform-level to MC-data.
- [ ] LSTCameraCalibrator component which inherits from ctapipe's CameraCalibrator. It needs to be adapted for applying the cat-B calibrations consisting of either filling the DL1CameraCalibrationContainer and applying them in the ._calibrate_dl1 step or applying them afterwards on peak times and charges if no waveforms are available (for 'dl1ab').
- [ ] LSTImageProcessor component inheriting from ctapipe's ImageProcessor which uses its own LSTImageCleaner. It needs to be adapted regarding the API of the ImageCleaner to allow pedestal cleaning.
- [ ] LSTMuonProcessor component inheriting from ctapipe's MuonProcessor adding an own CameraCalibrator instance which can use a different extractor (e.g. GlobalPeakWindowSum) for the muon analysis. It also creates its on DataWriter instance for writing promising muon events to an extra .h5 file.
- [ ] LHFitProcessor component which computes lhfit parameters, fills its own container an write it to the output file.
- [ ] ShowerProcessor component. I did not yet dive deep enough into ctapipe's ML module and lstchains dl1_to_dl2 step to know exactly what we can use/inherit from. WIP.

A first basic config can be found in lstchain/data/lstchain_base_config.yaml For contributing please open new PR's to this branch and reference them in the associated tasks above to not overload this PR for reviewers. I would suggest at least one PR per component

Additional tasks:

[ ] Update datachecks
[ ] Update (onsite)-scripts (and maybe convert some of them to tools)
[ ] Update notebooks
[ ] Delete obsolete modules/functions

I'm looking forward to all your ideas and comments. Feel free to edit/adapt the tasks.

moralejo commented 4 months ago

Hi @Hckjs , I don't understand why to implement this in lstchain a la ctapipe, rather than adapting lstchain to create (for the existing data) DL0 files that can then be fully processed with ctapipe. That is the idea behind https://github.com/cta-observatory/cta-lstchain/issues/1176 (nothing done yet as far as I know, just informal discussion off-github). Doesn't that make more sense?

maxnoe commented 4 months ago

@moralejo this is the more long standing issue of https://github.com/cta-observatory/cta-lstchain/issues/972

Are you saying we should skip that step and directly go converting DL0 files and using only ctapipe? I think this will take longer to get everything we currently do differently / additionally into ctapipe, and this exercise here is actually a large part of what is needed to do that (identify what is different, convert code into how ctapipe would do things, then start moving those to ctapipe itself).

I think this is largely independent of the question if we read already calibrated DL0 using ctapipe_io_zfits or do the calibraation on the fly using ctapipe_io_lst from R0 / R1 files. This is about the steps that come after.

moralejo commented 4 months ago

@moralejo this is the more long standing issue of #972

Are you saying we should skip that step and directly go converting DL0 files and using only ctapipe?

I think that is a better long-term solution for dealing with the existing LST-1 data.

I think this will take longer to get everything we currently do differently / additionally into ctapipe, and this exercise here is actually a large part of what is needed to do that (identify what is different, convert code into how ctapipe would do things, then start moving those to ctapipe itself).

Those additional developments (to process LST DL0 data in the way we want to, i.e. with the features currently implemented in lstchain) should be done in ctapipe. Why implementing them in lstchain before, and then export them to ctapipe? Let's focus instead in producing LST1-DL0 which can be used in the improvement of the ctapipe pipeline for real data. I see no advantage in doing it first in lstchain, and at the same time I see a large potential for accidentally introducing problems in our working (though cumbersome) system.

I think this is largely independent of the question if we read already calibrated DL0 using ctapipe_io_zfits or do the calibration on the fly using ctapipe_io_lst from R0 / R1 files. This is about the steps that come after.

Indeed, but if one gets to CTA-standard DL0 with lstchain, the later steps can be done in ctapipe.

Also, doing the developments in ctapipe will make more clear that they belong (as they should) to the DPPS IKCs, rather than being LST-internal developments.

cta-observatory / cta-lstchain