Changelog

bps_numerical.classification.classifiers.BulkTrainer is added to train any classifiers N times.
bps_numerical.classification.feature_scorers.GeneRanker is added to train a single classifier N times and extract all the common genes across different instances. This helps in identifying how the feature changes when re-training is done.
- Note: This is only used for a single phenotype.
- A loose implementation of the ranking process as studied here
bps_numerical.classification.feature_scorers.UnifiedFeatureScorer is added to unify features across all the training runs.
- if the same gene appears multiple times across different runs, an average score is taken (recommend to use normalize=True)
- This component can take in any components of type AbstractPhenotypeClassifier (please see the notebooks/bulk-trainer-gene-unificiation.ipynb)
bps_numerical.feature_selection.RandomFeatureSelector is added to randomly sample N number of input features. These features can then be used for downstream tasks.

With these components, we can "loosely" funnel down the relevant features. (See notebooks section of the repo)

TODO

cluster goodness calculation
proper readme
proper documentation for the overall process for gene identification
experiments with multi-label xgboost models which is available from 1.6+ version of xgboost and are still in the early stage. (Maybe we can try with sklearn's MultiOutputClassifier)

cc: @code-geek @xhagrg