cta-observatory / protopipe

Prototype data analysis pipeline for the Cherenkov Telescope Array Observatory
https://protopipe.readthedocs.io/en/latest/
Other
5 stars 13 forks source link

Complete refactoring of the pipeline #91

Closed HealthyPear closed 1 year ago

HealthyPear commented 3 years ago

This is basically the end result for what concerns the pipeline structure. It shouldn't change much depending on the nature of the data (simulated or real).

Summary of refactored pipeline steps/tools:

You can find a more schematic view of this from my presentation at the October 2020 CTAC meeting.

UPDATE on the current situation

BONUS(es)

Here the single tools are explained.

Training tool

This is the tool that produces data that can potentially be used for the training of the models. I say "potentially" because of course it could be used for DL1/DL2a studies or it could be used partially for training and testing (like it is done in the current version of protopipe).

Based on : ctapipe-process

Input: simtel files or real R1 data

Output:

Modeling tool

This is currently the less defined one, mainly because it is not strictly necessary for it to depend on ctapipe. Currently, this is done by protopipe.scripts.build_models.py which is based on protopipe.mva.

The main point here is that the tool should have its own configuration system and only worry about I/O. All the internal operations will be basically a black box from the point of view of the pipeline itself. This will allow to switch between ML libraries/frameworks/testing-code such as protopipe.mva, aict-tools or ctapipe.

Based on :

Input: from the Training tool

Output: model files with/without train/test data used (current protopipe includes them for intermediate benchmarking)

The only requirement of the output is to have at least 1 commonly accepted format (we can support more later if needed) so as to allow the use/comparison of different code sources.

Data processing tool

This is basically the tool that produces DL2 files (like protopipe.scripts.write_dl2).

Based on : ctapipe-process

Input:

Output:

Performance tool

This is the tool that translates science case and DL2 data into the specific DL3 data. This tool could also take care of event classes and types.

Based on : science-case-based configuration file + pyirf

Input: from the Data processing tool,

Output: data in DL3 format as in GADF + output from CTA IRF WG

maxnoe commented 1 year ago

This issue is essentially the same as #977 and seems to relate more to protopie than ctapipe. Most of the items on the list are also now implemented in ctapipe itself. closing.