MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
This PR includes the learners/analysis package. This package includes classes for feature importance analysis as described in [1].
New classes/interfaces:
ClassifierWithFeatureImportance: This meta algorithm serves the purpose of executing a classifier also capable of outputting feature importances.
FeatureImportanceClassifier: This interface defines the methods to be implemented on a Classifier to allow it to produce feature importances.
FeatureImportanceHoeffdingTree: This class uses the HoeffdingTree structure (composition) to produce feature importances. This class does not interfere with the training algorithm of the underlying HoeffdingTree model. Any subclass of the HoeffdingTree class can be set as the treeLearnerOption.
FeatureImportanceHoeffdingTreeEnsemble: This produce feature importances from ensembles of HoeffdingTree models and its subclasses.. This class does not interfere with the training algorithm of the underlying ensemble model. The base learner of the ensemble model must be either a HoeffdingTree or one of its subclasses.
Modified classes:
AdaptiveRandomForest: Implemented the getSublearners() method.
StreamingRandomPatches: Implemented the getSublearners() method.
HoeffdingTree: Added methods such as getObservedClassDistributionAtLeavesReachableThroughThisNode() to assist in the feature importances calculations. Modified the access modifiers for some methods. The changes do not interfere with training.
[1] Feature Scoring using Tree-Based Ensembles for Evolving Data Streams. H M Gomes, R Mello, B Pfahringer, A Bifet. IEEE Big Data, 2019.
This PR includes the learners/analysis package. This package includes classes for feature importance analysis as described in [1].
New classes/interfaces:
ClassifierWithFeatureImportance
: This meta algorithm serves the purpose of executing a classifier also capable of outputting feature importances.FeatureImportanceClassifier
: This interface defines the methods to be implemented on a Classifier to allow it to produce feature importances.FeatureImportanceHoeffdingTree
: This class uses the HoeffdingTree structure (composition) to produce feature importances. This class does not interfere with the training algorithm of the underlying HoeffdingTree model. Any subclass of the HoeffdingTree class can be set as the treeLearnerOption.FeatureImportanceHoeffdingTreeEnsemble
: This produce feature importances from ensembles of HoeffdingTree models and its subclasses.. This class does not interfere with the training algorithm of the underlying ensemble model. The base learner of the ensemble model must be either a HoeffdingTree or one of its subclasses.Modified classes:
AdaptiveRandomForest
: Implemented the getSublearners() method.StreamingRandomPatches
: Implemented the getSublearners() method.HoeffdingTree
: Added methods such as getObservedClassDistributionAtLeavesReachableThroughThisNode() to assist in the feature importances calculations. Modified the access modifiers for some methods. The changes do not interfere with training.[1] Feature Scoring using Tree-Based Ensembles for Evolving Data Streams. H M Gomes, R Mello, B Pfahringer, A Bifet. IEEE Big Data, 2019.