Rostlab / JS16_ProjectB_Group6

Game of Thrones characters are always in danger of being eliminated. The challenge in this assignment is to see at what risk are the characters that are still alive of being eliminated. The goal of this project is to rank characters by their Percentage Likelihood of Death (PLOD). You will assign a PLOD using machine learning approaches.
GNU General Public License v3.0
3 stars 4 forks source link

Final Predictions #66

Open goldbergtatyana opened 8 years ago

goldbergtatyana commented 8 years ago

Did you provide the function returning PLODs to group A? Also, please upload the file with the final predictions for all characters here and in your model description (so that we don't need to scroll up and down through old posts to find the correct file). What are your top 10 characters, please list them here as well. Thanks!

Hack3l commented 8 years ago

Final Prediction:weka_pred.json.txt Top 10: Grazdan zo Galare 99.6 Ghael 99.3 Tytos Blackwood 99.3 Tyrion Tanner 98.4 Valaena Velaryon 98.3 Lancel Lannister 98.2 Zollo 98 Tommen Baratheon 97.9 Urswyck 97.9 Willis Fell 97.8

subburamr commented 8 years ago

I have notified Project A to add the PLODs https://github.com/Rostlab/JS16_ProjectA/issues/120

gyachdav commented 8 years ago

What about top /popular/ 10?

Sent from my iPhone

On Mar 27, 2016, at 4:28 AM, Hack3l notifications@github.com wrote:

Final Prediction:weka_pred.json.txt Top 10: Grazdan zo Galare 99.6 Ghael 99.3 Tytos Blackwood 99.3 Tyrion Tanner 98.4 Valaena Velaryon 98.3 Lancel Lannister 98.2 Zollo 98 Tommen Baratheon 97.9 Urswyck 97.9 Willis Fell 97.8

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

subburamr commented 8 years ago

Top 10 Popular characters (norm. popularity > 0.34)

plod Name Normalized popularity
98.2 Lancel Lannister 0.47826087
97.9 Tommen Baratheon 1
97.2 Edmure Tully 0.655518395
96.4 Stannis Baratheon 1
95.9 Jeyne Poole 0.384615385
95.7 Edmyn Tully 0.357859532
95.3 Daenerys Targaryen 1
95.2 Illyrio Mopatis 0.394648829
94.5 Aurane Waters 0.377926421
94.2 Euron Greyjoy 0.739130435

PLOD for 20 Most Popular characters (norm. popularity > 0.75)

plod Name Normalized popularity
97.9 Tommen Baratheon 1
96.4 Stannis Baratheon 1
95.3 Daenerys Targaryen 1
91.8 Davos Seaworth 0.969899666
91.8 Petyr Baelish 1
74.1 Theon Greyjoy 1
71.1 Bran Stark 1
70.7 Tyrion Lannister 1
68.7 Samwell Tarly 0.969899666
68.5 Arya Stark 1
64.8 Margaery Tyrell 1
64.8 Jaime Lannister 1
61.3 Walder Frey 0.89632107
53.5 Varys 0.899665552
41.4 Barristan Selmy 1
28.9 Roose Bolton 0.97993311
18.7 Mace Tyrell 0.856187291
16.6 Cersei Lannister 1
11.6 Jon Snow 1
3.9 Sansa Stark 1
gyachdav commented 8 years ago

Jon snow?

Sent from my iPhone

On Mar 27, 2016, at 7:55 AM, Subbu notifications@github.com wrote:

Top 10 Popular characters (norm. popularity > 0.34)

plod Name Normalized popularity 0.982 Lancel Lannister 0.47826087 0.979 Tommen Baratheon 1 0.972 Edmure Tully 0.655518395 0.964 Stannis Baratheon 1 0.959 Jeyne Poole 0.384615385 0.957 Edmyn Tully 0.357859532 0.953 Daenerys Targaryen 1 0.952 Illyrio Mopatis 0.394648829 0.945 Aurane Waters 0.377926421 0.942 Euron Greyjoy 0.739130435 Top 10 Most Popular characters (norm. popularity > 0.75)

plod Name Normalized popularity 0.979 Tommen Baratheon 1 0.964 Stannis Baratheon 1 0.953 Daenerys Targaryen 1 0.918 Davos Seaworth 0.969899666 0.918 Petyr Baelish 1 0.741 Theon Greyjoy 1 0.711 Bran Stark 1 0.707 Tyrion Lannister 1 0.687 Samwell Tarly 0.969899666 0.685 Arya Stark 1 — You are receiving this because you commented. Reply to this email directly or view it on GitHub

subburamr commented 8 years ago

I have updated the list with all the most popular characters, but the popularity is based on old data

goldbergtatyana commented 8 years ago

Hi Group 6, amazing news here!

Your prediction algorithm was published just now in the news of one of the largest Russian news agencies https://lenta.ru/news/2016/04/15/thewindsofwinter/

We are observing a solid traffic coming from that direction and therefore I want to ask you to please tell me:

Thanks and congrats to the much attention your algo is receiving! Tatyana

subburamr commented 8 years ago

Hi Tatyana,

Here are the results on the full set(2028 characters) when using the same parameters. 2028_chars_weka_output.txt

=== Stratified cross-validation === === Summary ===

Correctly Classified Instances 1392 68.6391 % Incorrectly Classified Instances 636 31.3609 % Kappa statistic 0.287 Mean absolute error 0.3432 Root mean squared error 0.448 Relative absolute error 90.3191 % Root relative squared error 102.7918 % Total Number of Instances 2028

=== Detailed Accuracy By Class ===

           TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
             0.707     0.375      0.846     0.707     0.771      0.743    1
             0.625     0.293      0.422     0.625     0.504      0.743    0

Weighted Avg. 0.686 0.354 0.738 0.686 0.703 0.743

=== Confusion Matrix ===

a    b   <-- classified as

1069 442 | a = 1 194 323 | b = 0

Previous prediction output: 1946_chars_final_pred.txt

We assigned dead labels to 517(out of 2028) characters. It is based on any one of the below criteria

  1. The character has dateofDeath.
  2. We assigned current time reference as 305. If ( timeReference - dateofBirth) >=100, then character is assumed as dead.
  3. There were 23 characters who didn't have "Died" field in the awoiaf infobox but instead had "Died in" field eg: Joffrey Baratheon. These 23 characters were hardcoded as dead. https://github.com/Rostlab/JS16_ProjectB_Group6/blob/develop/api-handler/to_arff.js#L18
subburamr commented 8 years ago

Here is the plod value comparison for the major characters

PLOD for 20 Most Popular characters

Name Old Plod New Plod
Tommen Baratheon 97.9 99.8
Stannis Baratheon 96.4 99.1
Daenerys Targaryen 95.3 99.8
Davos Seaworth 91.8 90.9
Petyr Baelish 91.8 92.6
Theon Greyjoy 74.1 79.1
Bran Stark 71.1 76.4
Tyrion Lannister 70.7 86.2
Samwell Tarly 68.7 68.6
Arya Stark 68.5 29.6
Margaery Tyrell 64.8 56.3
Jaime Lannister 64.8 75
Walder Frey 61.3 59.9
Varys 53.5 55
Barristan Selmy 41.4 57.4
Roose Bolton 28.9 75.8
Mace Tyrell 18.7 55.6
Cersei Lannister 16.6 55.3
Jon Snow 11.6 17.1
Sansa Stark 3.9 3
sacdallago commented 8 years ago

Hey all!

I fear that it would do more harm than good to change the predictions now that many sites are referencing us saying "Jon Snow has 11% likelihood of death" and then have it at something different :)

We will definitively put the most recent data somewhere in the near future ( 2 weeks? ), but in the meantime: could you compile the old list + add only the predictions for the missing characters? That would be suuuuuuuuper awesome! :)

Thanks guys!

gyachdav commented 8 years ago

Group 6, The TUM press office made the following request for our press release: "... Is it possible to visualize the algorithm or some of its decision tree or … ? For most purposes we need wide format pictures...."

Do you have a nice looking figure of your algorithm, of the results, of the important features?

We need this ASAP as the press office will discuss the release on Monday and we plan to get it out by Tuesday.

@Hack3l @subburamr @juanmirocks

goldbergtatyana commented 8 years ago

thanks @subburamr for explaining death labels and providing new results!

For now, yes, as @sacdallago says all we need is a new Json file with the predictions of 1946 characters from the old model (i.e. the predictions we have on got.show now) and the predictions of remaining 82 characters from the new model. THANKS!

Oh and please also upload your weka input file here as well.

subburamr commented 8 years ago

I had to update our code and re-run the algorithm as all characters were identified as noble. This is because all characters now have titles attribute which is an array of titles (previously only one title was present and title attribute was not present for characters without any title). In the updated code, only the first title of a character is considered.

data.zip contains below files

  1. 2028_chars_old_pred.json - predictions of 1946 characters from the old model (i.e. the predictions we have on got.show now) and the predictions of remaining 82 characters from the new model.
  2. 2028_chars_new_pred.json - predictions for 2028 characters from new model
  3. 1946_characters.arff - old weka input file
  4. characters.arff - new weka input file(2028 characters)
  5. 2028_chars_final_weka_output.txt - weka output of new model