Open felker opened 4 years ago
Initially maybe archive both ONNX and h5 since we may use either for PCS deployment.
I'd advocate saving normalization as txt/h5 instead of npz to facilitate reading by PCS.
Better yet, could the normalization just be added as a layer to the model post-training so it is saved in the ONNX/H5 file? This would make implementation of the inference even simpler since the unnormalized data could be used as input to the deployed model.
Text files for signals names would also be easier for use in PCS.
I would think having some example trained models in the main repo would be useful, but maybe a larger library of models could be maintained separately?
Following discussion on Wednesday 2019-12-04 in FRNN group meeting in San Diego, we need to start systematically saving the best trained models for:
performance_analysis.py
that would allow a user to load a trained model and easily feed a set of shot(s) for inference, without using the bloated shot list and preprocessing pipeline that has been oriented towards training for the first phase of the project. Would enable exploratory studies about proximity to disruption, UQ, clustering, etc. This is an important intermediate step to setting up the C-based real-time inference tool in the PCS. )As a part of a broader effort towards improving reproducibility of our workflow, these models should be stored with:
.h5
file containing the tunable parameters (can be directly loaded by Keras or C-translated inference software)conf.yaml
and/or dumped final configuration used in specifying and training the model.npz
pickled class. ForVarNormalizer
, this would only consist of the standard deviations of each channel of each signal from the set of shots used to train the normalizer. However, it is serialized and saved as a "fat" class object that requires the entireplasma
module to load. Might want to dump a simple non-pickled array, or even.txt
, alongside the pickle, so that we have a simple file to load with the Keras-C wrapper.processed_shots/signal_group_*/*.npz
(order of channels and signals, sampling rates, thresholding? etc.), so that any real-time inference wrapper could apply a similar preprocessing to the incoming data.Given the binary
.h5
and.npz
files, we probably don't want to use VCS to store everything. But we might want to version control the plain-text metadata about the trained models. Store in this repository alongside the code? Or a new repository under our GitHub organization?Also, should we consider ONNX?