Rostlab / JS16_ProjectB_Group6

Game of Thrones characters are always in danger of being eliminated. The challenge in this assignment is to see at what risk are the characters that are still alive of being eliminated. The goal of this project is to rank characters by their Percentage Likelihood of Death (PLOD). You will assign a PLOD using machine learning approaches.
GNU General Public License v3.0
3 stars 4 forks source link

Test if books are a good feature #2

Closed Hack3l closed 8 years ago

Hack3l commented 8 years ago

Compare the SVMs with and without books, to decide if they are a good feature.

Hack3l commented 8 years ago

OUTPUT WITHOUT BOOKS SMO :

=== Run information ===

Scheme:weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0" Relation: characters-weka.filters.unsupervised.attribute.Remove-R1 Instances: 2235 Attributes: 5 culture allegiance born title isAlive Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

SMO

Kernel used: Linear Kernel: K(x,y) = <x,y>

Classifier for classes: dead, alive

BinarySMO

Machine linear: showing attribute weights, not support vectors.

     0.1085 * (normalized) culture=ironborn

Number of kernel evaluations: 2198088 (85.762% cached)

Time taken to build model: 1.85 seconds

=== Stratified cross-validation === === Summary ===

Correctly Classified Instances 1729 77.3602 % Incorrectly Classified Instances 506 22.6398 % Kappa statistic 0.1269 Mean absolute error 0.2264 Root mean squared error 0.4758 Relative absolute error 65.1537 % Root relative squared error 114.1771 % Total Number of Instances 2235

=== Detailed Accuracy By Class ===

           TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
             0.138     0.043      0.479     0.138     0.214      0.547    dead
             0.957     0.862      0.794     0.957     0.868      0.547    alive

Weighted Avg. 0.774 0.679 0.723 0.774 0.722 0.547

=== Confusion Matrix ===

a    b   <-- classified as

69 431 | a = dead 75 1660 | b = alive

Hack3l commented 8 years ago

output_without_books_rbfkernel.txt

subburamr commented 8 years ago

OUTPUT with Books attributes - polynomial kernel

=== Run information ===

Scheme:weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -C 250007 -E 1.0" Relation: characters-weka.filters.unsupervised.attribute.Remove-R1 Instances: 2235 Attributes: 10 culture allegiance born title book1 book2 book3 book4 book5 isAlive Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

SMO

Kernel used: Linear Kernel: K(x,y) = <x,y>

Classifier for classes: dead, alive

BinarySMO

Machine linear: showing attribute weights, not support vectors.

     0.1338 * (normalized) culture=ironborn

Number of kernel evaluations: 1973493 (91.705% cached)

Time taken to build model: 3.89 seconds

=== Stratified cross-validation === === Summary ===

Correctly Classified Instances 1740 77.8523 % Incorrectly Classified Instances 495 22.1477 % Kappa statistic 0.1369 Mean absolute error 0.2215 Root mean squared error 0.4706 Relative absolute error 63.7373 % Root relative squared error 112.9293 % Total Number of Instances 2235

=== Detailed Accuracy By Class ===

           TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
             0.138     0.037      0.519     0.138     0.218      0.551    dead
             0.963     0.862      0.795     0.963     0.871      0.551    alive

Weighted Avg. 0.779 0.677 0.733 0.779 0.725 0.551

=== Confusion Matrix ===

a    b   <-- classified as

69 431 | a = dead 64 1671 | b = alive

subburamr commented 8 years ago

output_with_books_SMO_RBFKernel.txt

Hack3l commented 8 years ago

Comparison of all outputs

POLY KERNEL WITHOUT BOOKS Correctly Classified Instances 1729 77.3602 % Incorrectly Classified Instances 506 22.6398 % Mean absolute error 0.2264 Root mean squared error 0.4758

RBF KERNEL WITHOUT BOOKS Correctly Classified Instances 1735 77.6286 % Incorrectly Classified Instances 500 22.3714 % Mean absolute error 0.2237 Root mean squared error 0.473

POLY KERNEL WITH BOOKS Correctly Classified Instances 1740 77.8523 % Incorrectly Classified Instances 495 22.1477 % Mean absolute error 0.2215 Root mean squared error 0.4706

RBF KERNEL WITH BOOKS Correctly Classified Instances 1735 77.6286 % Incorrectly Classified Instances 500 22.3714 % Mean absolute error 0.2237 Root mean squared error 0.473

POLY KERNEL WITH BOOLEAN BOOKS Correctly Classified Instances 1747 78.1655 % Incorrectly Classified Instances 488 21.8345 % Mean absolute error 0.2183 Root mean squared error 0.4673

RBF KERNEL WITH BOOLEAN BOOKS Correctly Classified Instances 1735 77.6286 % Incorrectly Classified Instances 500 22.3714 % Mean absolute error 0.2237 Root mean squared error 0.473

POLY KERNEL WITH BOOKS(WITHOUT MENTIONS) Correctly Classified Instances 1738 77.7629 % Incorrectly Classified Instances 497 22.2371 % Mean absolute error 0.2224 Root mean squared error 0.4716

RBF KERNEL WITH BOOKS(WIHTOU MENTIONS) Correctly Classified Instances 1735 77.6286 % Incorrectly Classified Instances 500 22.3714 % Mean absolute error 0.2237 Root mean squared error 0.473

POLY KERNEl WITH BOOLEAN BOOKS(WITHOUT MENTIONS) Correctly Classified Instances 1745 78.0761 % Incorrectly Classified Instances 490 21.9239 % Mean absolute error 0.2192 Root mean squared error 0.4682

RBF KERNEL WITH BOOLEAN BOOKS(WITHOUT MENTIONS) Correctly Classified Instances 1735 77.6286 % Incorrectly Classified Instances 500 22.3714 % Mean absolute error 0.2192 Root mean squared error 0.4682

subburamr commented 8 years ago

The data set with books as additional features (boolean values for character appearances in books) has been pushed upstream to the develop branch.

Weka Results Summary: POLY KERNEL WITH BOOLEAN BOOKS Correctly Classified Instances 1747 78.1655 % Incorrectly Classified Instances 488 21.8345 % Mean absolute error 0.2183 Root mean squared error 0.4673

RBF KERNEL WITH BOOLEAN BOOKS Correctly Classified Instances 1735 77.6286 % Incorrectly Classified Instances 500 22.3714 % Mean absolute error 0.2237 Root mean squared error 0.473