Complete refactoring of the pipeline

This is basically the end result for what concerns the pipeline structure. It shouldn't change much depending on the nature of the data (simulated or real).

Summary of refactored pipeline steps/tools:

[ ] Training tool
[ ] Modeling tool
[ ] Data processing tool
[ ] Performance tool

You can find a more schematic view of this from my presentation at the October 2020 CTAC meeting.

UPDATE on the current situation

[x] DataWriter, cta-observatory/ctapipe#1673
[x] Implement a shower processor, cta-observatory/ctapipe#1675
[x] Generalize ctapipe-stage1 to ctapipe-process, cta-observatory/ctapipe#1726
[ ] Update protopipe-MODELS to read from new ctapipe data model (see also cta-observatory/ctapipe#1755)
[x] Energy and classification estimation in ctapipe-process(see cta-observatory/ctapipe#1744)
[ ] Check that current DL2 doesn't break
[ ] Check that current GRID interface doesn't break

BONUS(es)

[x] cta-observatory/ctapipe#1744
- [ ] LST-stereo trigger
- [ ] Image extraction quality query

Here the single tools are explained.

Training tool

This is the tool that produces data that can potentially be used for the training of the models. I say "potentially" because of course it could be used for DL1/DL2a studies or it could be used partially for training and testing (like it is done in the current version of protopipe).

Based on : ctapipe-process

Input: simtel files or real R1 data

Output:

full DL1 data with the format given by ctapipe
DL2a data, which means a DL2 file filled with only shower geometry information (aside from any previous metadata of course)

Modeling tool

This is currently the less defined one, mainly because it is not strictly necessary for it to depend on ctapipe. Currently, this is done by protopipe.scripts.build_models.py which is based on protopipe.mva.

The main point here is that the tool should have its own configuration system and only worry about I/O. All the internal operations will be basically a black box from the point of view of the pipeline itself. This will allow to switch between ML libraries/frameworks/testing-code such as protopipe.mva, aict-tools or ctapipe.

Based on :

multiple solutions

Input: from the Training tool

Output: model files with/without train/test data used (current protopipe includes them for intermediate benchmarking)

The only requirement of the output is to have at least 1 commonly accepted format (we can support more later if needed) so as to allow the use/comparison of different code sources.

Data processing tool

This is basically the tool that produces DL2 files (like protopipe.scripts.write_dl2).

Based on : ctapipe-process

Input:

simtel files or real R1 data (as in the Training tool)
trained models (for energy estimation and classification)

Output:

full DL1 data with the format given by ctapipe
full DL2 data with the format given by ctapipe

Performance tool

This is the tool that translates science case and DL2 data into the specific DL3 data. This tool could also take care of event classes and types.

Based on : science-case-based configuration file + pyirf

Input: from the Data processing tool,

full DL1 data with the format given by ctapipe
full DL2 data with the format given by ctapipe

Output: data in DL3 format as in GADF + output from CTA IRF WG

cta-observatory / protopipe