Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
http://moa.cms.waikato.ac.nz/
GNU General Public License v3.0
614 stars 355 forks source link

measureByteSize() gets called twice in EvaluatePrequential for every stats collection cycle #272

Open nuwangunasekara opened 1 year ago

nuwangunasekara commented 1 year ago

measureByteSize() gets called twice in EvaluatePrequential for every stats collection cycle:

                double RAMHoursIncrement = learner.measureByteSize() / (1024.0 * 1024.0 * 1024.0); //GBs
                RAMHoursIncrement *= (timeIncrement / 3600.0); //Hours
                RAMHours += RAMHoursIncrement;
                lastEvaluateStartTime = evaluateTime;
                learningCurve.insertEntry(new LearningEvaluation(

especially at:

This could result in high computing overhead on periodic stats collection for ensemble methods like SRP, and ARF with large number of base learners (100).

Simple test with default SRP parameters and default stream:

moa.DoTask "EvaluatePrequential -l meta.StreamingRandomPatches -i 100000 -f 10000 -q 10000"

We could pass the already calculated byte size to getModelMeasurementsImpl()

Same happens with EvaluateInterleavedTestThenTrain as well

How to run the tests test.txt