This is a big (and exhaustive PR) that consolidates all the experimental features and branches.
The README.md illustrates all the components available in this PR. For brevity, here are the changes:
Major Changes
bps_numerical.preprocessing.DataLoader is added to aid the data loading and preprocessing pipeline
Now we have bps_numerical.feature_selection.BestCandidateFeatureSelector to compute the best candidate feature gene from each cluster in the post-clustering pipeline (See: bps_numerical.clustering.CorrelationClusterer for clustering) (See this outdated/halted PR: https://github.com/NASA-IMPACT/bps-numerical/pull/6)
bps_numerical.classification.tuner.BayesTuner component is added to perform Bayesian search for xgboost models (needed for bps_numerical.classification.feature_scorers.GeneRanker pipeline)
bps_numerical.classification.feature_scorers.GeneRanker is available as the main ranking pipeline which gives intersection-based ranked genes
bps_numerical.classification.feature_scorers.MeanReciprocalRanker is added for the MRR-based ranking algorithm. This makes use of N models obtained fromGeneRanker pipeline
Minor and Experimental Changes
notebooks/gene-ranker.pipeline.ipynb is updated to accommodate all the new components and pipelines
setup.py is added to aid in the local installation of bps_numerical package
This is a big (and exhaustive PR) that consolidates all the experimental features and branches.
The
README.md
illustrates all the components available in this PR. For brevity, here are the changes:Major Changes
bps_numerical.preprocessing.DataLoader
is added to aid the data loading and preprocessing pipelinebps_numerical.feature_selection.BestCandidateFeatureSelector
to compute the best candidate feature gene from each cluster in the post-clustering pipeline (See:bps_numerical.clustering.CorrelationClusterer
for clustering) (See this outdated/halted PR: https://github.com/NASA-IMPACT/bps-numerical/pull/6)bps_numerical.classification.tuner.BayesTuner
component is added to perform Bayesian search for xgboost models (needed forbps_numerical.classification.feature_scorers.GeneRanker
pipeline)bps_numerical.classification.feature_scorers.GeneRanker
is available as the main ranking pipeline which gives intersection-based ranked genesbps_numerical.classification.feature_scorers.MeanReciprocalRanker
is added for the MRR-based ranking algorithm. This makes use of N models obtained fromGeneRanker
pipelineMinor and Experimental Changes
notebooks/gene-ranker.pipeline.ipynb
is updated to accommodate all the new components and pipelinessetup.py
is added to aid in the local installation ofbps_numerical
packagebps_numerical.clustering.FeatureGrouper
is added to generate clusters based on simple correlation-based thresholding. This might not work because correlation does not follow a transitivity relationships (Ref: https://terrytao.wordpress.com/2014/06/05/when-is-correlation-transitive/)TODO
cc: @xhagrg @muthukumaranR