Closed kyllohd closed 4 years ago
I first started this just with Meka, then I went to Weka 3.8.3 to test this as standalone thing and the issue is on TextEmbedding (tried both cnnembedding and rnn embedding)
Here there are 3 files:
"tmdb-embeddings.arff" which I was trying to use as embeddings in the weka.classifiers.functions.Dl4jMlpClassifier.
tmdb_dummytrain.arff - training dataset
tmdb_dummytest.arff - test set
https://www.mediafire.com/file/phq192elgynm7sz/embeddings.zip/file
Steps to reproduce (cnn):
Actual results: You won't even be able to start the classification
Steps to reproduce (rnn):
Actual results: You can start the classification, but you get an error that it cannot handle string class:
08:33:34: Started weka.classifiers.functions.Dl4jMlpClassifier 08:33:34: Command: weka.classifiers.functions.Dl4jMlpClassifier -S 0 -cache-mode MEMORY -early-stopping "weka.dl4j.earlystopping.EarlyStopping -maxEpochsNoImprovement 0 -valPercentage 0.0" -normalization "Standardize training data" -iterator "weka.dl4j.iterators.instance.sequence.text.rnn.RnnTextEmbeddingInstanceIterator -stopWords \"weka.dl4j.text.stopwords.Dl4jRainbow \" -tokenPreProcessor \"weka.dl4j.text.tokenization.preprocessor.CommonPreProcessor \" -tokenizerFactory \"weka.dl4j.text.tokenization.tokenizer.factory.DefaultTokenizerFactory \" -truncationLength 100 -wordVectorLocation E:\mestrado\bases\embeddings\tmdb-embeddings.arff -bs 1" -iteration-listener "weka.dl4j.listener.EpochListener -eval true -n 5" -layer "weka.dl4j.layers.BatchNormalization -beta 0.0 -decay 0.9 -eps 1.0E-5 -gamma 1.0 -beta false -nOut 0 -activation \"weka.dl4j.activations.ActivationIdentity \" -name \"Batch normalization layer\"" -layer "weka.dl4j.layers.DenseLayer -nOut 0 -activation \"weka.dl4j.activations.ActivationReLU \" -name \"Dense layer\"" -layer "weka.dl4j.layers.SubsamplingLayer -mode Same -eps 1.0E-8 -rows 2 -columns 2 -paddingColumns 0 -paddingRows 0 -pnorm 0 -poolingType MAX -strideColumns 2 -strideRows 2 -name maxpool1" -layer "weka.dl4j.layers.DenseLayer -nOut 500 -activation \"weka.dl4j.activations.ActivationReLU \" -name ffn1" -layer "weka.dl4j.layers.RnnOutputLayer -lossFn \"weka.dl4j.lossfunctions.LossMCXENT \" -nOut 2 -activation \"weka.dl4j.activations.ActivationSoftmax \" -name \"RnnOutput layer\"" -logConfig "weka.core.LogConfiguration -append true -dl4jLogLevel WARN -logFile C:\Users\mansu\wekafiles\wekaDeeplearning4j.log -nd4jLogLevel INFO -wekaDl4jLogLevel INFO" -config "weka.dl4j.NeuralNetConfiguration -biasInit 0.0 -biasUpdater \"weka.dl4j.updater.Sgd -lr 0.001 -lrSchedule \\"weka.dl4j.schedules.ConstantSchedule -scheduleType EPOCH\\"\" -dist \"weka.dl4j.distribution.Disabled \" -dropout \"weka.dl4j.dropout.Disabled \" -gradientNormalization None -gradNormThreshold 1.0 -l1 NaN -l2 NaN -minimize -algorithm STOCHASTIC_GRADIENT_DESCENT -updater \"weka.dl4j.updater.Adam -beta1MeanDecay 0.9 -beta2VarDecay 0.999 -epsilon 1.0E-8 -lr 0.001 -lrSchedule \\"weka.dl4j.schedules.ConstantSchedule -scheduleType EPOCH\\"\" -weightInit XAVIER -weightNoise \"weka.dl4j.weightnoise.Disabled \"" -numEpochs 10 -numGPUs 1 -averagingFrequency 10 -prefetchSize 24 -queueSize 0 -zooModel "weka.dl4j.zoo.CustomNet " -output-debug-info -num-decimal-places 4 08:33:34: weka.classifiers.functions.Dl4jMlpClassifier: Cannot handle string attributes!
Hi, i tried to use RnnTextEmbeddingInstanceIterator for IMDB dataset, but it through error Dl4jClassifier: cannot handle string attributes! Any help please?
@kyllohd Sorry for the late reply. You are using the wrong model: Please choose RnnSequenceClassifier
instead of Dl4jMlpClassifier
.
@zahrashuaib The same goes for you.
Describe the bug After generating the word embeddings with weka 3.8 (weka.filters.unsupervised.attribute.Dl4jStringToWord2Vec). i've tried to use these embeddings in Meka 1.9.2 .
To Reproduce
Expected behavior You should be able to use Dlj4MLP in Binary Relevance method in Meka with embeddings generated in weka.
Additional Information
Error [INFO ] 16:21:57.131 [Thread-6] weka.classifiers.functions.Dl4jMlpClassifier - Building on 6296 training instances meka.gui.explorer.ClassifyTab Evaluation failed (train/test split): weka.core.InvalidInputDataException: An ARFF is required with a string attribute and a class attribute at weka.dl4j.iterators.instance.sequence.text.rnn.RnnTextEmbeddingInstanceIterator.validate(RnnTextEmbeddingInstanceIterator.java:57) at weka.dl4j.iterators.instance.sequence.text.rnn.RnnTextEmbeddingInstanceIterator.getDataSetIterator(RnnTextEmbeddingInstanceIterator.java:80) at weka.dl4j.iterators.instance.AbstractInstanceIterator.getDataSetIterator(AbstractInstanceIterator.java:59) at weka.classifiers.functions.Dl4jMlpClassifier.getDataSetIterator(Dl4jMlpClassifier.java:1069) at weka.classifiers.functions.Dl4jMlpClassifier.getDataSetIterator(Dl4jMlpClassifier.java:1121) at weka.classifiers.functions.Dl4jMlpClassifier.getFirstBatchFeatures(Dl4jMlpClassifier.java:1449) at weka.classifiers.functions.Dl4jMlpClassifier.createModel(Dl4jMlpClassifier.java:1294) at weka.classifiers.functions.Dl4jMlpClassifier.finishClassifierInitialization(Dl4jMlpClassifier.java:957) at weka.classifiers.functions.Dl4jMlpClassifier.initializeClassifier(Dl4jMlpClassifier.java:899) at weka.classifiers.functions.Dl4jMlpClassifier.buildClassifier(Dl4jMlpClassifier.java:816) at meka.classifiers.multilabel.BR.buildClassifier(BR.java:75) at meka.classifiers.multilabel.Evaluation.evaluateModel(Evaluation.java:428) at meka.classifiers.multilabel.Evaluation.evaluateModel(Evaluation.java:326) at meka.gui.explorer.ClassifyTab$7.run(ClassifyTab.java:414) at java.lang.Thread.run(Unknown Source) at meka.gui.explorer.AbstractThreadedExplorerTab$WorkerThread.run(AbstractThreadedExplorerTab.java:78)