Currently we are manually putting together the pipeline for processing XBT data. Now that the desired pipeline has been decided and described (by the code), it would be good to implement this properly using the scikit-learn pipeline class. As there is some custom processing going on, this will probably involved
With this we could encapsulate each step of processing whether custom or using standard scikit-learn object, into a pipeline, which can then be used to feed in to a voting classifier. Steps in the pipeline could include
select a subset of features (custom)
select a subset and splits of data (custom)
hyperparameter tuning (grid search or random) (standard scikit-learn)
cross validation (outer and inner) (standard scikit-learn, with custom folds)
calculate metrics (for score function) (custom classes using standard classes)
Currently we are manually putting together the pipeline for processing XBT data. Now that the desired pipeline has been decided and described (by the code), it would be good to implement this properly using the scikit-learn pipeline class. As there is some custom processing going on, this will probably involved
https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html https://scikit-learn.org/stable/modules/compose.html https://scikit-learn.org/stable/developers/develop.html?highlight=baseestimator
With this we could encapsulate each step of processing whether custom or using standard scikit-learn object, into a pipeline, which can then be used to feed in to a voting classifier. Steps in the pipeline could include