automl / autoweka

Auto-WEKA
http://www.cs.ubc.ca/labs/beta/Projects/autoweka/
330 stars 105 forks source link

Correctly Classified Instances decreased when more time was given #80

Closed timhot closed 3 years ago

timhot commented 4 years ago

This may be my lack of understanding .....

I first did a quick (15 min?) test on my data and Auto-WEKA tried 37 configurations, with an accuracy of 93%

I then did a 3 day run, 580 configurations were tried, and the resultant accuracy reported was 67%

This seems very odd to me - perhaps I am missing something crucial?

Full outputs below in case it helps you understand what I'd done.

Thanks for any suggestions that you can give.

Tim ps I am only really interested in the TP and FP rates for the morepork_more-pork class


Quick test

Auto-WEKA result: best classifier: weka.classifiers.trees.RandomForest arguments: [-I, 10, -K, 0, -depth, 0] attribute search: null attribute search arguments: [] attribute evaluation: null attribute evaluation arguments: [] metric: errorRate estimated errorRate: 0.013625789298770355 training time on evaluation dataset: 0.186 seconds

You can use the chosen classifier in your own code as follows:

Classifier classifier = AbstractClassifier.forName("weka.classifiers.trees.RandomForest", new String[]{"-I", "10", "-K", "0", "-depth", "0"}); classifier.buildClassifier(instances);

Correctly Classified Instances 2813 93.4862 % Incorrectly Classified Instances 196 6.5138 % Kappa statistic 0.9066 Mean absolute error 0.0193 Root mean squared error 0.0816 Relative absolute error 27.3209 % Root relative squared error 43.4984 % Total Number of Instances 3009

=== Confusion Matrix ===

a    b    c    d    e    f    g    h    i    j    k    l    m    n    o    p    q    r    s    t   <-- classified as

1464 7 0 0 3 0 0 12 0 0 0 0 0 1 1 0 0 0 0 0 | a = morepork_more-pork 25 479 1 0 2 0 0 30 0 0 1 0 1 0 0 0 1 1 0 0 | b = unknown 1 1 16 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 | c = siren 3 1 0 27 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 | d = dog 5 1 0 0 68 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 | e = duck 1 1 0 0 0 107 0 1 0 0 0 0 0 0 0 0 0 0 0 0 | f = dove 4 2 0 0 1 0 23 1 0 0 0 0 0 0 0 0 0 0 0 0 | g = human 34 15 0 0 1 1 0 205 0 0 1 0 0 0 1 0 0 0 0 0 | h = bird 0 2 0 0 0 0 0 1 22 0 0 0 0 0 0 0 0 0 0 0 | i = car 0 0 0 0 0 0 0 0 0 23 0 0 0 0 0 0 0 0 0 0 | j = rumble 2 6 0 0 0 2 0 4 0 0 249 0 0 0 0 0 0 0 0 0 | k = white_noise 0 0 0 0 0 0 0 2 0 0 0 8 0 0 0 0 0 0 0 0 | l = cow 1 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 | m = buzzy_insect 0 2 0 0 0 1 0 1 0 0 0 0 0 98 0 0 0 0 0 0 | n = plane 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 | o = hammering 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 | p = frog 1 1 0 0 0 0 0 2 0 0 0 0 0 0 0 0 8 0 0 0 | q = morepork_more-pork_part 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 | r = chainsaw 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 | s = crackle 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 | t = car_horn

=== Detailed Accuracy By Class ===

             TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
             0.984    0.051    0.950      0.984    0.967      0.933    0.996     0.996     morepork_more-pork
             0.885    0.016    0.925      0.885    0.905      0.885    0.989     0.963     unknown
             0.800    0.000    0.941      0.800    0.865      0.867    0.999     0.930     siren
             0.794    0.000    1.000      0.794    0.885      0.890    0.999     0.954     dog
             0.907    0.002    0.907      0.907    0.907      0.904    0.999     0.977     duck
             0.973    0.001    0.964      0.973    0.968      0.967    1.000     0.993     dove
             0.742    0.000    1.000      0.742    0.852      0.860    0.997     0.919     human
             0.795    0.021    0.779      0.795    0.787      0.767    0.989     0.902     bird
             0.880    0.000    1.000      0.880    0.936      0.938    1.000     0.956     car
             1.000    0.000    1.000      1.000    1.000      1.000    1.000     1.000     rumble
             0.947    0.001    0.988      0.947    0.967      0.964    0.999     0.991     white_noise
             0.800    0.000    0.889      0.800    0.842      0.843    0.999     0.871     cow
             0.750    0.000    0.750      0.750    0.750      0.750    0.999     0.654     buzzy_insect
             0.961    0.000    0.990      0.961    0.975      0.974    1.000     0.992     plane
             1.000    0.001    0.750      1.000    0.857      0.866    1.000     0.958     hammering
             1.000    0.000    1.000      1.000    1.000      1.000    1.000     1.000     frog
             0.667    0.000    0.889      0.667    0.762      0.769    0.996     0.737     morepork_more-pork_part
             1.000    0.000    0.750      1.000    0.857      0.866    1.000     0.806     chainsaw
             1.000    0.000    1.000      1.000    1.000      1.000    1.000     1.000     crackle
             1.000    0.000    1.000      1.000    1.000      1.000    1.000     1.000     car_horn

Weighted Avg. 0.935 0.030 0.936 0.935 0.934 0.912 0.995 0.977

Temporary run directories: /tmp/autoweka6555860488177895781/

For better performance, try giving Auto-WEKA more time. Tried 37 configurations; to get good results reliably you may need to allow for trying thousands of configurations.


3 day run

Auto-WEKA result: best classifier: weka.classifiers.functions.SMO arguments: [-C, 1.0322930159130057, -N, 0, -K, weka.classifiers.functions.supportVector.RBFKernel -G 0.4733376743447805] attribute search: null attribute search arguments: [] attribute evaluation: null attribute evaluation arguments: [] metric: errorRate estimated errorRate: 0.21635094715852443 training time on evaluation dataset: 2.835 seconds

You can use the chosen classifier in your own code as follows:

Classifier classifier = AbstractClassifier.forName("weka.classifiers.functions.SMO", new String[]{"-C", "1.0322930159130057", "-N", "0", "-K", "weka.classifiers.functions.supportVector.RBFKernel -G 0.4733376743447805"}); classifier.buildClassifier(instances);

Correctly Classified Instances 2024 67.2649 % Incorrectly Classified Instances 985 32.7351 % Kappa statistic 0.4836 Mean absolute error 0.0904 Root mean squared error 0.2094 Relative absolute error 128.0776 % Root relative squared error 111.589 % Total Number of Instances 3009

=== Confusion Matrix ===

a    b    c    d    e    f    g    h    i    j    k    l    m    n    o    p    q    r    s    t   <-- classified as

1445 42 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | a = morepork_more-pork 178 326 0 0 0 17 0 8 0 0 12 0 0 0 0 0 0 0 0 0 | b = unknown 10 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | c = siren 16 16 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | d = dog 51 17 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | e = duck 0 45 0 0 0 65 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | f = dove 21 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | g = human 124 97 0 0 0 2 0 32 0 0 3 0 0 0 0 0 0 0 0 0 | h = bird 2 20 0 0 0 2 0 0 0 0 1 0 0 0 0 0 0 0 0 0 | i = car 3 18 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 | j = rumble 17 92 0 0 0 6 0 0 0 0 148 0 0 0 0 0 0 0 0 0 | k = white_noise 2 6 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 | l = cow 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | m = buzzy_insect 13 71 0 0 0 3 0 1 0 0 13 0 0 1 0 0 0 0 0 0 | n = plane 2 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | o = hammering 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | p = frog 6 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | q = morepork_more-pork_part 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | r = chainsaw 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | s = crackle 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | t = car_horn

=== Detailed Accuracy By Class ===

             TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
             0.971    0.296    0.763      0.971    0.854      0.699    0.847     0.763     morepork_more-pork
             0.603    0.186    0.415      0.603    0.491      0.364    0.712     0.326     unknown
             0.000    0.000    ?          0.000    ?          ?        0.786     0.109     siren
             0.000    0.000    ?          0.000    ?          ?        0.802     0.032     dog
             0.093    0.000    1.000      0.093    0.171      0.302    0.785     0.226     duck
             0.591    0.012    0.657      0.591    0.622      0.609    0.973     0.559     dove
             0.000    0.000    ?          0.000    ?          ?        0.746     0.022     human
             0.124    0.003    0.780      0.124    0.214      0.292    0.766     0.262     bird
             0.000    0.000    ?          0.000    ?          ?        0.900     0.093     car
             0.000    0.000    ?          0.000    ?          ?        0.934     0.210     rumble
             0.563    0.012    0.822      0.563    0.668      0.656    0.917     0.619     white_noise
             0.000    0.000    ?          0.000    ?          ?        0.855     0.047     cow
             0.000    0.000    ?          0.000    ?          ?        0.802     0.252     buzzy_insect
             0.010    0.000    1.000      0.010    0.019      0.097    0.876     0.213     plane
             0.000    0.000    ?          0.000    ?          ?        0.860     0.153     hammering
             0.000    0.000    ?          0.000    ?          ?        0.810     0.001     frog
             0.000    0.000    ?          0.000    ?          ?        0.702     0.007     morepork_more-pork_part
             0.000    0.000    ?          0.000    ?          ?        0.656     0.004     chainsaw
             0.000    0.000    ?          0.000    ?          ?        1.000     1.000     crackle
             0.000    0.000    ?          0.000    ?          ?        0.758     0.001     car_horn

Weighted Avg. 0.673 0.182 ? 0.673 ? ? 0.824 0.551

Temporary run directories: /tmp/autoweka1020608005622997004/

For better performance, try giving Auto-WEKA more time. Tried 580 configurations; to get good results reliably you may need to allow for trying thousands of configurations.

larskotthoff commented 4 years ago

Auto-WEKA runs involve a certain degree of randomization, so there's a few things that could have happened. First, it's possible that the first run simply got lucky and found a good configuration early on. Second, it might have evaluated on different test data sets -- did you use different data sets in the two runs? The training times on the evaluation data set are very different, so I would think that the data was different.

In this case the first, quick run simply gave you a result that was misleading because it wasn't evaluated on a large and/or representative enough data set.