Open gyachdav opened 8 years ago
1949 characters we looked at 31 features (not all present for every character) resulting in 60.419 datapoints (if all characters had all features)
What features do those dead characters have in common that are misclassified as alive ones by your model?
Looking at most contributing (i.e. for a prediction important) features, can we tell that a character described by a particular feature has more chances to die?
Are women more likely to survive than men? One way for answering the question is:
Given two opponent characters (e.g. Arya Stark and Cersei Lannister _or _Sansa Stark and Ramsay Bolton), who is more likely to die next?
Did Jon Snow eventually survive or not?
What else?
Here are some of the stats,
1. Major feature values contributing to death
In the results, the normalized weight of the attributes were mentioned. Based on that, a higher value corresponded to more chances of classifying a character as dead.
Below are the highest and lowest value for some of the features
1. Culture
most likely to die
1.0035 * (normalized) culture=Valyrian
37/43 dead
least likely to die
-1.2471 * (normalized) culture=Ironmen
0/5 dead
2. Title
most likely to die
1.4967 * (normalized) title=Prince of Dragonstone
5/5 dead people
least likely to die
-1.1306 * (normalized) title=Lord Commander of the Night's Watch
2/9 dead
2. Men vs Women: Women are more likely to survive.
3. Opponent Characters Below are the PLODs
Arya Stark and Cersei Lannister
Sansa Stark and Ramsay Bolton
Jon Snow Based on the plod, Jon Snow is likely to be resurrected :smile:
4. Misclassified Dead People
199 dead people have been misclassified as alive
Below are some information about them
Missing values
160 without date of birth/age
130 without culture
115 without titles
Common Features: 29 Night's watch 14 northmen 12 ironborn 11 Free Folk
Other Info: Among the 199 misclassified,
@marcusnovotny take a look. nice stats for your landing pages. hope this sparks new ideas.
Please also compile a list of the most dangerous houses and the "safest" houses. This ranking would be a simple average of PLODs for all characters grouped by houses.
I got the average PLOD for the houses now but I still need to round and sort the PLOD values. It'll be up by tomorrow.
Top Ten Most Dangerous Houses: House Lannister of Casterly Rock PLOD: 0.982 Blacks PLOD: 0.977 House Moore PLOD: 0.94 House Egen PLOD: 0.937 Good Masters PLOD: 0.926 House Cassel PLOD: 0.917 Brave Companions PLOD: 0.908 Khal PLOD: 0.904 House Cockshaw PLOD: 0.886 House Celtigar PLOD: 0.869
Top Ten 'Safest' Houses: House Humble PLOD: 0.007 House Merlyn PLOD: 0.006 Wise Masters PLOD: 0.004 House Codd PLOD: 0.004 House Myre PLOD: 0.003 House Farwynd PLOD: 0.002 House Stonetree PLOD: 0.002 House Tawney PLOD: 0.002 House Sparr PLOD: 0.002 House Goodbrother of Shatterstone PLOD: 0.002
is there a complete list I can look at? the top ten hardly contain any known names that we can use on the landing pages/.
I can also just consider the pagerank of the house members when making up the top tens. Right now, I only executed the housePlod.js file in the 'stats' directory (branch housePLOD). I just committed the complete output here: https://github.com/Rostlab/JS16_ProjectB_Group6/commit/6131b9536fe9cdf10a46aaab74c8e19c233d4af3
even better yes! have a list flittered for popularity based on the aggregated pagerank for a house.
@marcusnovotny check out this list https://github.com/Rostlab/JS16_ProjectB_Group6/commit/6131b9536fe9cdf10a46aaab74c8e19c233d4af3
Seems like we have a headline: "House Lannister to become extinct by the end of Song of Ice and Fire"
Looks like they're in good company though :smile:
@marcusnovotny I have to redo that top ten... it turns out that the naming of the houses are not consistent, so better not use it for the landing page yet. (e.g as for Lannisters, there is 'House Lannister', 'House Lannister of Lannisport' and 'House Lannister of Casterly Rock').
Fixed it. You can find the complete list of houses with at least one popular character ranked by their average PLOD here: https://github.com/Rostlab/JS16_ProjectB_Group6/blob/288d52dbc91cd531d9714c18347f797161135b07/stats/sortedHousePlod.json
:cry: and I really wanted the Rains of Castamere to become the perfect literary device. The extinction of the Lannisters would have been such a great Ice and Fire moment and really in line with the way GRRM develops the plot throughout the series. Oh well, we still have House Poole to speculate about :laughing:
hold on! those characters who are already dead - what is their PLOD? Is it 100%. If yes - :muscle:
If no - the statistics is messed up!
@ThuyNganTran two suggestions on how to improve the statistics from above:
Please note - very important - consider only those characters as dead for your statistics who died a violent death, i.e. before reaching the age of 100 years old!
I fixed the PLODs to 1 for dead characters who died before the age of 100. I also compiled a list for all houses and for only the popular houses (https://github.com/Rostlab/JS16_ProjectB_Group6/commit/e9d6013b71bde3f98fad73cac6543589551ec214). As for the size of the houses, I don't really have an idea of what you imagine it to be, since there should be a way to compare the houses to each other, despite their sizes.
@ThuyNganTran can you please provide a link to the characters.csv file you are working with? Thanks
Hey @ThuyNganTran ! I'm still working on the pages and really interested in your data. Is there more than the sortedHousesPlod.json available?
I'm particularly interested in the following characters / houses: House Stark House Lannister Daenerys Targaryen Ramsay Bolton Theon Greyjoy Hodor
Thanks in advance.
@goldbergtatyana here is the csv that I obtained from exporting an excel file containing all relevant character data: https://github.com/Rostlab/JS16_ProjectB_Group6/blob/housePLOD/stats/characters.csv @marcusnovotny I'll provide you with the following link: https://github.com/Rostlab/JS16_ProjectB_Group6/tree/housePLOD/stats It contains the ranking for all houses, only popular houses and also the csv file that contains the PLOD for every single character so you can just browse through to get the ones you need.
Thank you!
@gyachdav @goldbergtatyana @sacdallago Just bringing to your notice that there could be inconsistencies for character PLODs displayed on webpage due to below reasons.
Ahh, my request was meant for group 7 - sorry @subburamr this is what happens when taking care of too many tasks at the same time. Everything is perfectly fine with your data :+1:
@ThuyNganTran thanks Thuy for the nice work!
You probably noticed that we were surprised by important houses scoring so low. We took a look at the data and some things became more clear to us. For example that we should consider the size of a house for calling it dangerous - a house of 1 out 1 killed member is not more dangerous than a house where 9 of 10 members were killed.
Therefore, we want to ask you to please do the analysis of plod-calculation-per-house again and these should be your steps:
@goldbergtatyana As far as I understood your instructions, I just replaced the #ofHouseMembers with 105 (number of largest house) when dividing the accumulated plod. The results for popular and all houses are under this commit https://github.com/Rostlab/JS16_ProjectB_Group6/commit/1700117e7e7739a80441d32f19fd06dce11f6cdc. I still don't really get how that normalized value will be representative for the average plod of the houses though... wouldn't maybe working with the variance (amount of spread of the values around the average plod of each house) work as well? I get that this approach might be too impracticable though.
Still unhappy with the results, cause:
Therefore @ThuyNganTran let's repeat the exercise one more time - hopefully the last one :) - please do the following:
It is simple I know, but this is the best we can do right now!
For reference, I get the following numbers for following houses: Stark (22 dead + 50 alive) = 40.246/72 = 0.558972222 Lannister (18 dead + 33 alive) = 0.575568627 Bolton (4 dead + 9 alive) = 0.529153846 Frey = 0.385927835 Martell = 0.300689655 Greyjoy = 0.592219512 Arryn = 0.57375 Tully = 0.947916667
Thanks so much!!
Hi @ThuyNganTran, is it all clear how to get to the PLOD numbers of houses?
Hi @goldbergtatyana I'm sorry for the late reply but I had a tutoring seminar the last three days and my laptop's power supply had a literal meltdown yesterday... anyway, I hope that this https://github.com/Rostlab/JS16_ProjectB_Group6/commit/e1616a6e99120072a7c63faaf0609faeb4e5aab3 is what you wanted (json lists are updated too @marcusnovotny )!
Thanks for keeping up with the good work, @ThuyNganTran! The results look good and you can expect seeing them on got.show soon :)
Meysters, as part of the media blitz we're planning there will be a press release that will throw some _big numbers _at the readers. Can you provide some impressive statistics about the data you processed to come up with your predictions. something like looked at 25 features for 2000 characters totaling in 500k data points. Any thing that you think might be interesting IS interesting.