MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
This could result in high computing overhead on periodic stats collection for ensemble methods like SRP, and ARF with large number of base learners (100).
Simple test with default SRP parameters and default stream:
measureByteSize() gets called twice in EvaluatePrequential for every stats collection cycle:
especially at:
double RAMHoursIncrement = learner.measureByteSize() / (1024.0 * 1024.0 * 1024.0); //GBs
and at
learningCurve.insertEntry(new LearningEvaluation(
by LearningEvaluation()'s model.getModelMeasurements()This could result in high computing overhead on periodic stats collection for ensemble methods like SRP, and ARF with large number of base learners (100).
Simple test with default SRP parameters and default stream:
moa.DoTask "EvaluatePrequential -l meta.StreamingRandomPatches -i 100000 -f 10000 -q 10000"
MOA master 6eacf9b
Task completed in 6m24s (CPU time)
Time after commenting the first occurrence:
double RAMHoursIncrement = learner.measureByteSize() / (1024.0 1024.0 1024.0); //GBs
double RAMHoursIncrement = 0.0 / (1024.0 1024.0 1024.0); //GBs RAMHoursIncrement *= (timeIncrement / 3600.0); //Hours RAMHours += RAMHoursIncrement; lastEvaluateStartTime = evaluateTime;
Time after commenting both the occurrences:
measureByteSize()));
0.0)); Measurement[] modelMeasurements = getModelMeasurementsImpl(); if (modelMeasurements != null) { measurementList.addAll(Arrays.asList(modelMeasurements)); diff --git a/moa/src/main/java/moa/tasks/EvaluatePrequential.java b/moa/src/main/java/moa/tasks/EvaluatePrequential.java index 8003489..16b51c8 100644 --- a/moa/src/main/java/moa/tasks/EvaluatePrequential.java +++ b/moa/src/main/java/moa/tasks/EvaluatePrequential.java @@ -213,7 +213,7 @@ public class EvaluatePrequential extends ClassificationMainTask implements Capab long evaluateTime = TimingUtils.getNanoCPUTimeOfCurrentThread(); double time = TimingUtils.nanoTimeToSeconds(evaluateTime - evaluateStartTime); double timeIncrement = TimingUtils.nanoTimeToSeconds(evaluateTime - lastEvaluateStartTime);
double RAMHoursIncrement = learner.measureByteSize() / (1024.0 1024.0 1024.0); //GBs
double RAMHoursIncrement = 0.0 / (1024.0 1024.0 1024.0); //GBs RAMHoursIncrement *= (timeIncrement / 3600.0); //Hours RAMHours += RAMHoursIncrement; lastEvaluateStartTime = evaluateTime;
We could pass the already calculated byte size to
getModelMeasurementsImpl()
Same happens with
EvaluateInterleavedTestThenTrain
as wellHow to run the tests test.txt