Closed subburamr closed 8 years ago
Adding some results for subset of instances
Attributes used:(Attributes with >50% missing removed) Attributes: 16 name title male culture house book1 book2 book3 book4 book5 isNoble numDeadRelations boolDeadRelations isPopular popularity isAlive
Dataset 1. Instances - Only with attribute IsPopular = 1(with normalized popularity score > 0.34) No of instances = 115 === Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.691 0.267 0.704 0.691 0.697 0.788 1
0.733 0.309 0.721 0.733 0.727 0.792 0
Weighted Avg. 0.713 0.289 0.713 0.713 0.713 0.79
=== Confusion Matrix ===
a b <-- classified as 38 17 | a = 1 16 44 | b = 0 1.OnlyPopular_poly.txt 1.OnlyPopular_rbf.txt
Dataset 2. Either Popular or has title No of instance = 971 === Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.823 0.454 0.818 0.823 0.82 0.756 1
0.546 0.177 0.556 0.546 0.551 0.756 0
Weighted Avg. 0.744 0.374 0.742 0.744 0.743 0.756
=== Confusion Matrix ===
a b <-- classified as 569 122 | a = 1 127 153 | b = 0 2.OnlyPopOrTitle_poly.txt 2.OnlyPopOrTitle_rbf.txt
Dataset 3. Instances = Either (popular or has title) and (has culture or has house ) No of instances = 850
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.754 0.435 0.808 0.754 0.78 0.725 1
0.565 0.246 0.486 0.565 0.522 0.725 0
Weighted Avg. 0.699 0.38 0.714 0.699 0.705 0.725
=== Confusion Matrix ===
a b <-- classified as 454 148 | a = 1 108 140 | b = 0
3.OnlyPopOrTitleandHorC_poly.txt 3.OnlyPoporTitleandHorC_rbf.txt
check this out for more feature that were tested:
As i can see, the result of dataset 1 is the leading one with the F-measure of 0.727. However, the data set of it is rather small (115 characters)... For each dataset, how do you compare yourself to random when predicting new dead characters? Have you tried other ML classification algorithms?
Below are the result with Randomforest, which was good for smaller dataset but didn't give a good result for a larger dataset. Random Forest Dataset 2: Either Popular or has attribute title === Summary ===
Correctly Classified Instances 717 73.8414 % Incorrectly Classified Instances 254 26.1586 % Kappa statistic 0.1566 Mean absolute error 0.3783 Root mean squared error 0.4226 Relative absolute error 92.1235 % Root relative squared error 93.281 % Total Number of Instances 971
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.983 0.864 0.737 0.983 0.842 0.738 1
0.136 0.017 0.76 0.136 0.23 0.738 0
Weighted Avg. 0.738 0.62 0.744 0.738 0.666 0.738 === Confusion Matrix ===
a b <-- classified as 679 12 | a = 1 242 38 | b = 0
Naive bayes: - the results were not better than SVM Dataset 2: Either Popular or has attribute title === Summary ===
Correctly Classified Instances 721 74.2533 % Incorrectly Classified Instances 250 25.7467 % Kappa statistic 0.2965 Mean absolute error 0.2825 Root mean squared error 0.4657 Relative absolute error 68.7948 % Root relative squared error 102.7997 % Total Number of Instances 971
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.893 0.629 0.778 0.893 0.832 0.703 1
0.371 0.107 0.584 0.371 0.454 0.703 0
Weighted Avg. 0.743 0.478 0.722 0.743 0.723 0.703
=== Confusion Matrix ===
a b <-- classified as 617 74 | a = 1 176 104 | b = 0
One observation when training with different datasets for SVM was that, the dead characters which get misclassified are mostly the same values. For eg: In the previous result, dataset2(127 dead misclassified ) and dataset3(108 dead misclassified), had 102 values in common. And even when trying with different combination of attributes, most of the values still remain misclassified.
However removing them from dataset gives a good result. Dataset 2: Either Popular or has attribute title, misclassified instances removed. === Summary ===
Correctly Classified Instances 753 90.942 % Incorrectly Classified Instances 75 9.058 % Kappa statistic 0.649 Mean absolute error 0.094 Root mean squared error 0.274 Relative absolute error 32.8436 % Root relative squared error 72.495 % Total Number of Instances 828
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.971 0.385 0.924 0.971 0.947 0.941 1
0.615 0.029 0.815 0.615 0.701 0.941 0
Weighted Avg. 0.909 0.323 0.905 0.909 0.904 0.941
=== Confusion Matrix ===
a b <-- classified as 665 20 | a = 1 55 88 | b = 0
Yep, given that dead characters make 17% of your data set, the prediction with Naive Bayes for dataset2 is much better than random (62%, 88 correct out of 143). What are the misclassified instances? Do they share a common (missing?) set of features?
For the misclassified dead characters(137), some patterns were missing values for culture(53) and most were with Title "Ser"(64). However removing culture attribute or removing other attributes one by one did not reduce these misclassifications much. Below is the list of dead characters who get misclassified in the different datasets. dead_misclassified.xlsx
The result in the previous comment(dataset2 with misclassified instances removed) was with SMO polynomial kernel and i noticed that SMO itself provided better results than other classification algorithms. Here are some results with other algorithms on the full dataset. results_other_algorithm_fulldataset.txt
So far the best result was obtained when using SMO polykernel along with thresholdSelector to automatically optimize the F-measure value for class “dead” and setting Replace missing values with mean or median. For the SMO kernel, changing the C value from 1 to 8 improved the filtered dataset but reduced the accuracy of the full dataset.
Full Dataset === Summary ===
Correctly Classified Instances 1433 73.6382 %
Incorrectly Classified Instances 513 26.3618 %
Kappa statistic 0.3544
Mean absolute error 0.32
Root mean squared error 0.4307
Relative absolute error 84.3261 %
Root relative squared error 98.902 %
Total Number of Instances 1946
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.784 0.402 0.851 0.784 0.816 0.76 1
0.598 0.216 0.485 0.598 0.536 0.76 0
Weighted Avg. 0.736 0.355 0.758 0.736 0.745 0.76
=== Confusion Matrix ===
a b <-- classified as
1137 314 | a = 1 199 296 | b = 0
Filtered Dataset: Only instances with "Title" or Popular. === Summary ===
Correctly Classified Instances 717 73.8414 % Incorrectly Classified Instances 254 26.1586 % Kappa statistic 0.396 Mean absolute error 0.2794 Root mean squared error 0.4338 Relative absolute error 68.0424 % Root relative squared error 95.7574 % Total Number of Instances 971
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.779 0.361 0.842 0.779 0.809 0.79 1
0.639 0.221 0.539 0.639 0.585 0.79 0
Weighted Avg. 0.738 0.321 0.755 0.738 0.744 0.79
=== Confusion Matrix ===
a b <-- classified as 538 153 | a = 1 101 179 | b = 0 Poly_ThresholdSelector_fulldataset.txt Poly_ThresholdSelector_PopOrTitle_dataset.txt
Below are the attributes which were used Attribute Ranking 0.09480986639259702 13 book4 0.07291363981192191 8 house 0.05979625565956465 4 culture 0.052158273381294196 14 book5 0.043165467625899026 20 isNoble 0.043165467625899026 3 male 0.03491544426796254 2 title 0.028145764263670943 21 age 0.027749229188078366 12 book3 0.024768756423432896 19 isMarried 0.01783144912641327 18 isAliveSpouse 0.01505652620760542 11 book2 0.013514902363823281 23 boolDeadRelations 0.01228160328879757 10 book1 0.010739655549845857 25 popularity 0.004367934224049325 16 isAliveFather 0.0038540596094552874 24 isPopular 0.0036485097636176724 15 isAliveMother 0.0034943473792394615 17 isAliveHeir 0.0030524152106885974 22 numDeadRelations 0.0010731507869434253 9 spouse 0.0001523361764684993536 6 father 0.0001518796244910712576 5 mother 0.00010044143503417536 7 heir
Of all the attempts listed here to find an optimal model, the SMO - polykernel model on the full data set is the best one. I think we could stop here and use the results of this model for got.show.
What I did not understand, however, why are the results of this model are different between the post of 4 days ago:
=== Confusion Matrix ===
a b <-- classified as
1129 322 | a = 1 217 278 | b = 0
and the one from one day ago:
a b <-- classified as 1137 314 | a = 1 199 296 | b = 0
This discussion exceeds my knowledge, but the title of this issue is:
choose optimal features and parameters for predicting PLOD
Which is definitely something that should be closed after a feature freeze.
Yes, the deliver is tomorrow - no improvements should be done after tomorrow. However, today you @subburamr could try one more thing:
Take the predictions on the full data set of your final model. Each character has a prediction of being DEAD and the corresponding PLOD. As of now (i.e. by default), the PLOD of 50 discriminates dead from alive ones. Will the performance of your model improve if you would lower the PLOD?
Below are the differences between the latest result and the first result.
@sacdallago Since we were getting feedback on the results and trying to choose which features to use among our collected features, I had not closed this issue. However we have not added any new feature, not even pushed a new commit since the feature freeze date :smiley:
@goldbergtatyana By reducing the threshold value, the model shows improvement in classifying number of dead people.
Here is the summary with threshold value 35%, testing with threshold lower than this seems to affect the overall performance.
Summary with threshold value 35% Correctly classified Instances 71.0688591984 % Incorrectly classified Instances 28.9311408016 %
=== Confusion Matrix === Prediction 1053 398 Alive 165 330 Dead
######### Classification Report ######### precision recall f1-score support Dead 0.45 0.67 0.54 495 Alive 0.86 0.73 0.79 1451 avg / total 0.76 0.71 0.73 1946
Below file contains reports for the different threshold values. plod_varying_thresholds.txt
@subburamr :D well, if you found a better predictor, you can still push that but then close this issue (by today) and rest at least 'til the 10th of April! You can go on with this discussion in the future, opening a new issue "chose better predictions then v1.0.0" and you can start working on this repo as much as you wish :dancer: https://github.com/Rostlab/JS16_ProjectE/issues/19
@subburamr the improvement in classifying dead is at the cost of correctly classified alive. Therefore, the default threshold of 50 should be the one to use. Please forward your results (a function that provides a PLOD for each character) _today _ to group A and please write here a short summary of how you have developed your prediction model.
@subburamr In the final prediction model, the "Ranking of Attributes using Relief F score" lists 24 features. Is this your total number of features and not 26 as written in the description?
Also, what do abbreviations stand for:
Finally, what do exactly "related to dead" and "number dead relations" mean?
Thank you!
Yes, the number of attributes used is 24. I have now updated the description. In the weka result, name and isAlive label are added as attributes making it 26, so we had reported that we were using 26 attributes. However name is removed via Remove filter before running SVM classifier and used only for reporting the probability of class. Final prediction output for reference: final_output.txt
GoT, CoK, SoS, FfC, DwD are the abbreviations for the 5 books Game of Thrones, Clash of Kings etc. and they represent whether a character has appeared in a specific book ( value 1 - if appeared, else 0).
Kernel: 1. polykernel
3.attributeranking.txt 3.polykernel.txt 3.rbfkernel.txt.gz