This is basically the end result for what concerns the pipeline structure.
It shouldn't change much depending on the nature of the data (simulated or real).
[x] Implement a shower processor, cta-observatory/ctapipe#1675
[x] Generalize ctapipe-stage1 to ctapipe-process, cta-observatory/ctapipe#1726
[ ] Update protopipe-MODELS to read from new ctapipe data model (see also cta-observatory/ctapipe#1755)
[x] Energy and classification estimation in ctapipe-process(see cta-observatory/ctapipe#1744)
[ ] Check that current DL2 doesn't break
[ ] Check that current GRID interface doesn't break
BONUS(es)
[x] cta-observatory/ctapipe#1744
[ ] LST-stereo trigger
[ ] Image extraction quality query
Here the single tools are explained.
Training tool
This is the tool that produces data that can potentially be used for the training of the models.
I say "potentially" because of course it could be used for DL1/DL2a studies or it could be used partially for training and testing (like it is done in the current version of protopipe).
Based on :ctapipe-process
Input: simtel files or real R1 data
Output:
full DL1 data with the format given by ctapipe
DL2a data, which means a DL2 file filled with only shower geometry information (aside from any previous metadata of course)
Modeling tool
This is currently the less defined one, mainly because it is not strictly necessary for it to depend on ctapipe.
Currently, this is done by protopipe.scripts.build_models.py which is based on protopipe.mva.
The main point here is that the tool should have its own configuration system and only worry about I/O.
All the internal operations will be basically a black box from the point of view of the pipeline itself.
This will allow to switch between ML libraries/frameworks/testing-code such as protopipe.mva, aict-tools or ctapipe.
Based on :
multiple solutions
Input: from the Training tool
Output: model files with/without train/test data used (current protopipe includes them for intermediate benchmarking)
The only requirement of the output is to have at least 1 commonly accepted format (we can support more later if needed) so as to allow the use/comparison of different code sources.
Data processing tool
This is basically the tool that produces DL2 files (like protopipe.scripts.write_dl2).
Based on : ctapipe-process
Input:
simtel files or real R1 data (as in the Training tool)
trained models (for energy estimation and classification)
Output:
full DL1 data with the format given by ctapipe
full DL2 data with the format given by ctapipe
Performance tool
This is the tool that translates science case and DL2 data into the specific DL3 data.
This tool could also take care of event classes and types.
Based on : science-case-based configuration file + pyirf
Input:
from the Data processing tool,
full DL1 data with the format given by ctapipe
full DL2 data with the format given by ctapipe
Output: data in DL3 format as in GADF + output from CTA IRF WG
This issue is essentially the same as #977 and seems to relate more to protopie than ctapipe. Most of the items on the list are also now implemented in ctapipe itself. closing.
This is basically the end result for what concerns the pipeline structure. It shouldn't change much depending on the nature of the data (simulated or real).
Summary of refactored pipeline steps/tools:
You can find a more schematic view of this from my presentation at the October 2020 CTAC meeting.
UPDATE on the current situation
ctapipe-stage1
toctapipe-process
, cta-observatory/ctapipe#1726protopipe-MODELS
to read from new ctapipe data model (see also cta-observatory/ctapipe#1755)ctapipe-process
(see cta-observatory/ctapipe#1744)BONUS(es)
Here the single tools are explained.
Training tool
This is the tool that produces data that can potentially be used for the training of the models. I say "potentially" because of course it could be used for DL1/DL2a studies or it could be used partially for training and testing (like it is done in the current version of protopipe).
Based on :
ctapipe-process
Input: simtel files or real R1 data
Output:
Modeling tool
This is currently the less defined one, mainly because it is not strictly necessary for it to depend on ctapipe. Currently, this is done by
protopipe.scripts.build_models.py
which is based onprotopipe.mva
.The main point here is that the tool should have its own configuration system and only worry about I/O. All the internal operations will be basically a black box from the point of view of the pipeline itself. This will allow to switch between ML libraries/frameworks/testing-code such as
protopipe.mva
, aict-tools or ctapipe.Based on :
Input: from the Training tool
Output: model files with/without train/test data used (current protopipe includes them for intermediate benchmarking)
The only requirement of the output is to have at least 1 commonly accepted format (we can support more later if needed) so as to allow the use/comparison of different code sources.
Data processing tool
This is basically the tool that produces DL2 files (like
protopipe.scripts.write_dl2
).Based on : ctapipe-process
Input:
Output:
Performance tool
This is the tool that translates science case and DL2 data into the specific DL3 data. This tool could also take care of event classes and types.
Based on : science-case-based configuration file + pyirf
Input: from the Data processing tool,
Output: data in DL3 format as in GADF + output from CTA IRF WG