MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
This update introduces a separate tab for Active Learning. It contains newly implemented Active Learning algorithms as well as some extensions to the graphical user interface.
Active Learning
So far, the following two Active Learning algorithms are supported, but further ones will be added in the future:
ALRandom: Decides randomly, if an instance should be used for training.
ALZliobaite2011: This class contains four active learning strategies for streaming data that explicitly handle concept drift. They are based on randomization, fixed uncertainty, dynamic allocation of labeling efforts over time and randomization of the search space (Zliobaite et al., 2011). It also contains the Selective Sampling strategy, which is adapted from Cesa-Bianchi et al. (Cesa-Bianchi et al., 2006) and uses a variable labeling threshold.
Graphical User Interface Extensions
The tab's graphical interface is based on the Classification tab, but some additional functionality has been added:
Result table
The result preview has been updated from a simple text field showing CSV data to an actual table:
Hierarchy of Tasks
In order to enable convenient and fast evaluation that provides reliable results, we introduced new tasks with a special hierarchy:
ALPrequentialEvaluationTask: Perform prequential evaluation for any chosen active learner.
ALMultiParamTask: Compare different parameter settings for the same algorithm by performing multiple ALPrequentialEvaluationTasks.
ALPartitionEvaluationTask: Split a data stream into several partitions and perform an ALMultiParamTask on each one. This allows for cross-validation-like evaluation.
The tree structure of those tasks and their parameters are now also indicated in the task overview panel on top of the window:
Evaluation
The introduced task hierarchy requires an extended evaluation scheme. In all of these graphs, color coding is used for better distinction of different runs.
Each type of task has its own evaluation style:
ALPrequentialEvaluationTask: Results for one single experiment are shown.
ALMultiParamTask: Results for all parameter configurations are shown in one graph.
ALPartitionEvaluationTask: Mean values and standard deviation calculated over all folds are shown for each parameter configuration.
For ALMultiParamTasks and ALPartitionEvaluationTasks there are also two more types of evaluation:
Any selected measure can be inspected in relation to the value of the varied parameter.
The same can be done with regard to the true label acquisition rate, because this measure, often also called budget, is very important in active learning applications.
This update introduces a separate tab for Active Learning. It contains newly implemented Active Learning algorithms as well as some extensions to the graphical user interface.
Active Learning
So far, the following two Active Learning algorithms are supported, but further ones will be added in the future:
Graphical User Interface Extensions
The tab's graphical interface is based on the Classification tab, but some additional functionality has been added:
Result table
The result preview has been updated from a simple text field showing CSV data to an actual table:
Hierarchy of Tasks
In order to enable convenient and fast evaluation that provides reliable results, we introduced new tasks with a special hierarchy:
The tree structure of those tasks and their parameters are now also indicated in the task overview panel on top of the window:
Evaluation
The introduced task hierarchy requires an extended evaluation scheme. In all of these graphs, color coding is used for better distinction of different runs. Each type of task has its own evaluation style:
For ALMultiParamTasks and ALPartitionEvaluationTasks there are also two more types of evaluation: Any selected measure can be inspected in relation to the value of the varied parameter.
The same can be done with regard to the true label acquisition rate, because this measure, often also called budget, is very important in active learning applications.