Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
http://moa.cms.waikato.ac.nz/
GNU General Public License v3.0
610 stars 353 forks source link

Added new ability to automate generate multiple ARFF files and evaluate a learner on each #115

Closed richard-moulton closed 6 years ago

richard-moulton commented 6 years ago

I have written a new task, WriteMultipleStreamsToARFF, which automates multiple WriteStreamToARFFFile tasks. It allows multiple streams to be generated by selecting a stream generator and passing it a series of different seed values to the generator's random processes. The resulting streams are then saved in separate ARFF files.

I have also written a new task, EvaluateMultipleClusterings, which automates multiple EvaluateClustering tasks. Given a sequence of of ARFF files generated using WriteMultipleStreamsToARFF, it properly calls EvaluateClustering for given learner on each ARFF file. The resulting evaluations are then saved in separate CSV files. It is analogous to the RunStreamTasks task, which automates classifier tasks using different classifier parameters.

In the process of writing these tasks I discovered and addressed two issues:

113 - Replaced the measureCollectionType framework with flagOptions and a boolean array to track choices. Any of the MeasureCollections available for clustering algorithms1 can now be selected, are properly added and properly enabled for use during the EvaluateClustering task.

114 - Ensured that the BatchCmd constructor method recognizes that the argument totalInstances is set to -1 to signify no limit. If this is the case then BatchCmd sets its internal totalInstances variable to the maximum integer value.

Note 1: The OutlierPerformance and ChangeDetectionMeasures measure collections are both assessed as applicable to clustering algorithms when building the moa.gui.clustertab.ClusteringEvalPanel. Neither implements anything in their evaluateClustering method so I omitted them from being selectable for the EvaluateClustering and EvaluateMultipleClusterings tasks.