cta-observatory / cta-lstchain

LST prototype testbench chain
https://cta-observatory.github.io/cta-lstchain/
BSD 3-Clause "New" or "Revised" License
22 stars 77 forks source link

WIP: Large scale Refactoring #1232

Open Hckjs opened 4 months ago

Hckjs commented 4 months ago

WIP: It is the first very very draft on refactoring the whole lstchain code to

The first idea is to implement a LSTProcessor-Tool analogous to ctapipe's Processor Tool with LST-specific Components inheriting as much as possible from ctapipe classes. The aim here would be to mainly substitute the r0_to_dl1, dl1_ab and dl1_to_dl2 scripts to have just one tool that you can feed with different configs regarding your desired analysis step. The new analysis flow could look like:

1) R0 to DL1

lstchain_process --input r0_events.fits.fz --config config.yaml --output events.dl1.h5

Processing R0 data up to dl1 with Cat-A calibrations all defined in a base config.yaml. It can optionally write out muons and interleaved events (maybe also allow to only write out the interleaved pedestal events directly for Cat-B calibrations)

2) Reprocess DL1 (DL1ab)

lstchain_process --input events_cat_A.dl1.h5 --cat-b-calibrations cat_B-calibrations.h5  --config config.yaml config_dlab.yaml --output events_cat_B.dl1.h5

Not only for reprocessing dl1 data (e.g. with different cleaning settings...), but also for applying Cat-B calibrations and including pedestal cleaning. All defined in an additional config_dl1ab.yaml.

3) DL1 to DL2

lstchain_process --input Cat_B.dl1.h5 --config config_dl2.yaml --output events.dl2.h5

Process DL1 to DL2 with a specific (src (in)dependet) config_dl2.yaml.

Or directly from R0 to DL2

lstchain_process --input r0_events.fits.fz --cat-b-calibrations cat_b_calibrations.h5 --config config.yaml --output events.dl2.h5

To process directly from R0 to DL2 (with Cat-B calibs) you can first write out the interleaved events with the 'interleaved-only'-mode and compute the cat-B calibrations with the separate tool. After that you hand in your config, the cat-B calibrations and RF-Models (maybe trained by ctapipe's ML module - ctapipe-train-...) to the processor tool and process directly to DL2.

A first basic config can be found in lstchain/data/lstchain_base_config.yaml For contributing please open new PR's to this branch and reference them in the associated tasks above to not overload this PR for reviewers. I would suggest at least one PR per component

Additional tasks:

I'm looking forward to all your ideas and comments. Feel free to edit/adapt the tasks.

moralejo commented 4 months ago

Hi @Hckjs , I don't understand why to implement this in lstchain a la ctapipe, rather than adapting lstchain to create (for the existing data) DL0 files that can then be fully processed with ctapipe. That is the idea behind https://github.com/cta-observatory/cta-lstchain/issues/1176 (nothing done yet as far as I know, just informal discussion off-github). Doesn't that make more sense?

maxnoe commented 4 months ago

@moralejo this is the more long standing issue of https://github.com/cta-observatory/cta-lstchain/issues/972

Are you saying we should skip that step and directly go converting DL0 files and using only ctapipe? I think this will take longer to get everything we currently do differently / additionally into ctapipe, and this exercise here is actually a large part of what is needed to do that (identify what is different, convert code into how ctapipe would do things, then start moving those to ctapipe itself).

I think this is largely independent of the question if we read already calibrated DL0 using ctapipe_io_zfits or do the calibraation on the fly using ctapipe_io_lst from R0 / R1 files. This is about the steps that come after.

moralejo commented 4 months ago

@moralejo this is the more long standing issue of #972

Are you saying we should skip that step and directly go converting DL0 files and using only ctapipe?

I think that is a better long-term solution for dealing with the existing LST-1 data.

I think this will take longer to get everything we currently do differently / additionally into ctapipe, and this exercise here is actually a large part of what is needed to do that (identify what is different, convert code into how ctapipe would do things, then start moving those to ctapipe itself).

Those additional developments (to process LST DL0 data in the way we want to, i.e. with the features currently implemented in lstchain) should be done in ctapipe. Why implementing them in lstchain before, and then export them to ctapipe? Let's focus instead in producing LST1-DL0 which can be used in the improvement of the ctapipe pipeline for real data. I see no advantage in doing it first in lstchain, and at the same time I see a large potential for accidentally introducing problems in our working (though cumbersome) system.

I think this is largely independent of the question if we read already calibrated DL0 using ctapipe_io_zfits or do the calibration on the fly using ctapipe_io_lst from R0 / R1 files. This is about the steps that come after.

Indeed, but if one gets to CTA-standard DL0 with lstchain, the later steps can be done in ctapipe.

Also, doing the developments in ctapipe will make more clear that they belong (as they should) to the DPPS IKCs, rather than being LST-internal developments.