Closed HealthyPear closed 3 years ago
May be a good reason to look at aict-tools for that part. It already has the input features fully configurable.
https://github.com/fact-project/aict-tools/blob/master/examples/config_energy.yaml
For the output of features generated by write_dl2, that will be replaced by the ctapipe DL2Writer or whatever we call it, and the philosophy will be similar to the DL1 files: compute and store all parameters always, so no configuration should be needed.
May be a good reason to look at aict-tools for that part. It already has the input features fully configurable.
https://github.com/fact-project/aict-tools/blob/master/examples/config_energy.yaml
Yes, this issue is of course related to the current implementation provided by protopipe.mva to provide an easier use of protopipe from 0.5.0 onwards.
My initial intention is to allow the pipeline to host a number of libraries for ML. The only requirement for this would be a common configuration system and at least 1 common data format (like now we are using the pickled files from scikit-learn).
For the output of features generated by write_dl2, that will be replaced by the ctapipe DL2Writer or whatever we call it, and the philosophy will be similar to the DL1 files: compute and store all parameters always, so no configuration should be needed.
Yes these are no problem, here I refer to the Model features, so the parameters used to train the model(s).
I saw the aict-tools config, but there they use simple unique DL1/DL2a variables (I have no idea about more complex choices and I would need time to play with it - that's the reason I want first to provide a first easy solution with what we currently have).
What I am talking about is handling features "anonymously" in a way that I do not have to worry about reading some more complex analytical functions like e.g. atan2(cog_y - dir_y, cog_x - dir_x)
or log10(Width*Length/Size)
Currently the modeling features,
are defined through the configuration files (either
regressor.yaml
orclassifier.yaml
)when the appropriate classes in
protopipe.mva
read them they pass throughprotopipe.mva.utils.prepare_data
which adds modified versions of the basic DL1/DL2a variables into the dataframes (such as e.g. log10 of variables or more complex analytical combinations)most importantly they are hardcoded into
write_dl2.py
These 3 steps as they are make a bit difficult if not annoying and easily error-prone to play with different features.
My current idea is to make a dictionary - open to the user through the documentation - where all possible features and new ones (so this dictionary would be open-ended) are mapped to integers.
So something like,
In doing this then the user would input the features from the configuration files in form of a list of integers which will be then read by the DL2 script as it is mapping unambiguously the features to the estimation section.