Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Right now, I have a classifier working with NgramsExtractor and MultinomialNaiveBayes training. However, when I change the text extractor to WordSequenceExtractor, it will have error at the fitting stage (Same for UniqueWordSequenceExtractor):
6819 [main] INFO com.datumbox.framework.core.machinelearning.classification.MultinomialNaiveBayes - fit()
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.ClassCastException
at com.datumbox.framework.common.concurrency.ThreadMethods.forkJoinExecution(ThreadMethods.java:116)
at com.datumbox.framework.common.concurrency.ForkJoinStream.forEach(ForkJoinStream.java:56)
at com.datumbox.framework.core.machinelearning.common.abstracts.algorithms.AbstractNaiveBayes._fit(AbstractNaiveBayes.java:278)
at com.datumbox.framework.core.machinelearning.common.abstracts.AbstractTrainer.fit(AbstractTrainer.java:125)
at com.datumbox.framework.core.machinelearning.modelselection.Validator.validate(Validator.java:67)
at com.avrio.AVcgclassifier.Classification.main(Classification.java:131)
Caused by: java.util.concurrent.ExecutionException: java.lang.ClassCastException
at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:996)
at com.datumbox.framework.common.concurrency.ThreadMethods.forkJoinExecution(ThreadMethods.java:112)
... 5 more
Caused by: java.lang.ClassCastException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:590)
... 7 more
Caused by: java.lang.ClassCastException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:590)
at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:668)
at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:726)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:160)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:430)
at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:594)
at com.datumbox.framework.common.concurrency.ForkJoinStream.lambda$forEach$0(ForkJoinStream.java:55)
at java.base/java.util.concurrent.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1393)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:283)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1603)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: java.lang.ClassCastException: java.base/java.lang.String cannot be cast to java.base/java.lang.Number
at com.datumbox.framework.common.dataobjects.TypeInference.toDouble(TypeInference.java:163)
at com.datumbox.framework.core.machinelearning.common.abstracts.algorithms.AbstractNaiveBayes.lambda$_fit$1(AbstractNaiveBayes.java:284)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:291)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:747)
... 3 more
I assume there's some format change that causes this issue?
The Naive Bayes model requires bag-of-words based extractors not sequence-based ones. Other models like LDA require sequences. It's all about what type of model you are using.
Right now, I have a classifier working with
NgramsExtractor
andMultinomialNaiveBayes
training. However, when I change the text extractor toWordSequenceExtractor
, it will have error at the fitting stage (Same forUniqueWordSequenceExtractor
):I assume there's some format change that causes this issue?