Prediction speed scales with training data size rather than output size

davidfstein commented 1 year ago

I am running some experiments with the Mulan wrapper. Particularly, I added the COCOA method from that repository and am running the following for training:

java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -split-percentage 100 -t "train.arff" -d "clf.dmp" -W weka.classifiers.trees.J48 and for inference: java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -t "train.arff" -T "test.arff" -l "clf.dmp" -W weka.classifiers.trees.J48

Notably, training time increases moderately but reasonably as "train.arff" grows. However, with a fixed "test.arff" size, inference time scales exponentially with "train.arff" size. It seems almost as if training is not actually occurring during the first command but rather in the second. My java is very rusty so perhaps that is indeed what is happening. Is this the expected behavior?

fracpete commented 1 year ago

I just submitted a fix (https://github.com/Waikato/meka/commit/0608eeffd56cbb109902719515b632559e21a6c7), that will allow you to evaluate a previously trained model on a test set. This wasn't possible before, the model always got retrained with the training data.

With the latest snapshot, you would use something like this:

java -cp "~/bin/meka-release-1.9.8-SNAPSHOT/lib/*" meka.classifiers.multilabel.MULAN -S COCOA -verbosity 8 -threshold 1 -T "test.arff" -l "clf.dmp"

davidfstein commented 1 year ago

Thanks for the quick fix!

I rebuilt from master, but I'm running into this error now:

java.lang.ArrayIndexOutOfBoundsException: Index 1341 out of bounds for length 1341 at weka.core.DenseInstance.value(DenseInstance.java:347) at mulan.transformations.BinaryRelevanceTransformation.transformInstance(BinaryRelevanceTransformation.java:126) at mulan.classifier.transformation.BinaryRelevance.makePredictionInternal(BinaryRelevance.java:83) at mulan.classifier.MultiLabelLearnerBase.makePrediction(MultiLabelLearnerBase.java:113) at mulan.classifier.transformation.COCOA.makePredictionforThreshold(COCOA.java:305) at mulan.classifier.transformation.COCOA.makePredictionInternal(COCOA.java:324) at mulan.classifier.MultiLabelLearnerBase.makePrediction(MultiLabelLearnerBase.java:113) at meka.classifiers.multilabel.MULAN.distributionForInstance(MULAN.java:263) at meka.classifiers.multilabel.Evaluation.testClassifier(Evaluation.java:617) at meka.classifiers.multilabel.Evaluation.evaluateModel(Evaluation.java:419) at meka.classifiers.multilabel.Evaluation.runExperiment(Evaluation.java:301) at meka.classifiers.multilabel.ProblemTransformationMethod.runClassifier(ProblemTransformationMethod.java:172) at meka.classifiers.multilabel.ProblemTransformationMethod.evaluation(ProblemTransformationMethod.java:152) at meka.classifiers.multilabel.MULAN.main(MULAN.java:273)

fracpete commented 1 year ago

Please provide a minimal example that replicates this problem.

Waikato / meka

Prediction speed scales with training data size rather than output size #79