Waikato / meka

Multi-label classifiers and evaluation procedures using the Weka machine learning framework.
http://waikato.github.io/meka/
GNU General Public License v3.0
200 stars 76 forks source link

Unexpected Meka Evaluation Result #55

Open Mali-DS opened 5 years ago

Mali-DS commented 5 years ago

Hi, The result of my evaluation is zero and I don't know why? my code is here: try { ConverterUtils.DataSource dataSource = new ConverterUtils.DataSource(FILE_PATH); // original dataset Instances preparedDataSet = dataSource.getDataSet(); preparedDataSet = filterUnsupervisedAttributes(preparedDataSet); preparedDataSet.setClassIndex(7);

        CRUpdateable classifier = new CRUpdateable();
        RandomForest randomForest = createRandomForest(1);
        classifier.setClassifier(randomForest);

        Instances  trainingInstances = new Instances(dataSource.getStructure()); // temporary dataset for train
        trainingInstances = filterUnsupervisedAttributes(trainingInstances);
        trainingInstances.setClassIndex(7);

        Instances testInstances = new Instances(dataSource.getStructure()); // temporary dataset for test
        testInstances = filterUnsupervisedAttributes(testInstances);
        testInstances.setClassIndex(7);
        int countTestInstances = 0;
        int countTrainInstances = 0;
        boolean firstTrain = true;
        boolean benchTest = true;
        int numInst = preparedDataSet.numInstances();
        for(int row = 123; row < 5021; row++) {
                Instance trainingInstance = preparedDataSet.instance(row);
                trainingInstances.add(trainingInstance); // collect instances to use as training
                countTrainInstances++;
                if (firstTrain && countTrainInstances%100 == 0 ) {  // train the classifier with the first 100 instances(without any missing values)
                    firstTrain = false;
                    classifier.buildClassifier(trainingInstances);
                }
                if(!firstTrain){
                    benchTest = true;

// classifier.updateClassifier(trainingInstance);

                    for(int j=row+1;j<row+101;j++){
                        if(benchTest && countTestInstances != 100) { // add next 100 instances to testInstance
                            Instance testInstance = preparedDataSet.instance(j);
                            testInstances.add(testInstance);
                            countTestInstances++;

                            if (countTestInstances % 100 == 0) {
                                System.out.println("Evaluate CRUpdateable classifier on ");
                                String top = "PCut1";
                                String vop = "3";
                                Result result = Evaluation.evaluateModel(classifier, trainingInstances , testInstances, top, vop);
                                System.out.println("Evaluation available metrics: " + result.availableMetrics());
                                System.out.println("Evaluation Info: " + result.toString());
                                System.out.println("Levenshtein distance: " + result.getValue("Levenshtein distance"));
                                System.out.println("Type: " + result.getInfo("Type"));
                                countTestInstances = 0;
                                benchTest = false;
                                testInstances.delete();
                            }
                        }
                    }
                }
        }
    } catch (Exception e) {
        e.printStackTrace();
    }

The result of Evaluation is here:

Evaluation Info: == Evaluation Info

Classifier meka.classifiers.multiltarget.incremental.CRUpdateable Options [-W, weka.classifiers.trees.RandomForest, --, -P, 100, -I, 1, -num-slots, 1, -K, 0, -M, 1.0, -V, 0.001, -S, 1] Additional Info
Dataset Missing_values_Predicted-weka.filters.unsupervised.attribute.RemoveType-Tstring Number of labels (L) 7 Type MT Verbosity 3

== Predictive Performance

N(test) 100 L 7
Hamming score 0
Exact match 0
Hamming loss 1
ZeroOne loss 1
Levenshtein distance 1
Label indices [ 0 1 2 3 4 5 6 ] Accuracy (per label) [ 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ]

== Additional Measurements

Number of training instances 154 Number of test instances 100 Label cardinality (train set) 659.407 Label cardinality (test set) 676.757 Build Time 0.061 Test Time 0.006 Total Time 0.067

fracpete commented 5 years ago

From a quick glance, you seem to treat the data like you would for Weka. However, Meka works a bit different. See the following examples:

Final remark, you only seem to have a single class attribute...

Mali-DS commented 5 years ago

Thanks for your answer, you mentioned good points, I changed my code and used Meka ways, now code is as under: try { ConverterUtils.DataSource dataSource = new ConverterUtils.DataSource(FILE_PATH); // original dataset Instances preparedDataSet = dataSource.getDataSet();

        CRUpdateable classifier = new CRUpdateable();
        RandomForest randomForest = createRandomForest(1);  // random forest is not updatable classifier
        classifier.setClassifier(randomForest);

        Instances  trainingInstances = new Instances(dataSource.getStructure()); 
        Instances testInstances = new Instances(dataSource.getStructure());
        int countTestInstances = 0;
        int countTrainInstances = 0;
        boolean firstTrain = true;
        boolean benchTest = true;
        for(int row = 123; row < 5021; row++) {
                Instance trainingInstance = preparedDataSet.instance(row);
                trainingInstances.add(trainingInstance); // collect instances to use as training
                countTrainInstances++;
                if (firstTrain && countTrainInstances%100 == 0 ) { 
                    trainingInstances = PrepareClassAttributes(trainingInstances,"1,2,3,4,5,6,7");
                    firstTrain = false;
                    classifier.buildClassifier(trainingInstances);
                }
                if(!firstTrain){
                    benchTest = true;
                    classifier.updateClassifier(trainingInstance);
                    for(int j=row+1;j<row+101;j++){
                        if(benchTest && countTestInstances != 100) { 
                            Instance testInstance = preparedDataSet.instance(j);
                            testInstances.add(testInstance);
                            countTestInstances++;
                            if (countTestInstances % 100 == 0) {
                                testInstances = PrepareClassAttributes(testInstances,"1,2,3,4,5,6,7");
                                System.out.println("Evaluate CRUpdateable classifier on ");
                                String top = "PCut1"; 
                                String vop = "3";  
                                Result result = Evaluation.evaluateModel(classifier, trainingInstances , testInstances, top, vop);
                                System.out.println("Evaluation Info: " + result.toString());
                                countTestInstances = 0;
                                benchTest = false;
                                testInstances.delete();
                            }
                        }
                    }
                }
        }

    } catch (Exception e) {
        e.printStackTrace();
    }

but yet the Accuracy is zero, and the stats results are strange:

N(test) 100 L 7
Hamming score 0
Exact match 0
Hamming loss 1
ZeroOne loss 1
Levenshtein distance 1
Label indices [ 0 1 2 3 4 5 6 ] Accuracy (per label) [ 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ]

jmread commented 5 years ago

Actually the stats results make sense given that there are 0 correct predictions. Without being familiar with your data, it is difficult to know if this is 'strange' or not. Have you tried getting results using a simple test in the GUI first? Or to print out the prediction for each instance?