Rostlab / JS16_ProjectB_Group6

Game of Thrones characters are always in danger of being eliminated. The challenge in this assignment is to see at what risk are the characters that are still alive of being eliminated. The goal of this project is to rank characters by their Percentage Likelihood of Death (PLOD). You will assign a PLOD using machine learning approaches.
GNU General Public License v3.0
3 stars 4 forks source link

Stats about your project #60

Open gyachdav opened 8 years ago

gyachdav commented 8 years ago

Meysters, as part of the media blitz we're planning there will be a press release that will throw some _big numbers _at the readers. Can you provide some impressive statistics about the data you processed to come up with your predictions. something like looked at 25 features for 2000 characters totaling in 500k data points. Any thing that you think might be interesting IS interesting.

Hack3l commented 8 years ago

1949 characters we looked at 31 features (not all present for every character) resulting in 60.419 datapoints (if all characters had all features)

goldbergtatyana commented 8 years ago

What features do those dead characters have in common that are misclassified as alive ones by your model?

goldbergtatyana commented 8 years ago

Looking at most contributing (i.e. for a prediction important) features, can we tell that a character described by a particular feature has more chances to die?

goldbergtatyana commented 8 years ago

Are women more likely to survive than men? One way for answering the question is:

goldbergtatyana commented 8 years ago

Given two opponent characters (e.g. Arya Stark and Cersei Lannister _or _Sansa Stark and Ramsay Bolton), who is more likely to die next?

goldbergtatyana commented 8 years ago

Did Jon Snow eventually survive or not?

goldbergtatyana commented 8 years ago

What else?

subburamr commented 8 years ago

Here are some of the stats,

1. Major feature values contributing to death In the results, the normalized weight of the attributes were mentioned. Based on that, a higher value corresponded to more chances of classifying a character as dead.
Below are the highest and lowest value for some of the features

1. Culture most likely to die 1.0035 * (normalized) culture=Valyrian 37/43 dead
least likely to die -1.2471 * (normalized) culture=Ironmen 0/5 dead

2. Title most likely to die 1.4967 * (normalized) title=Prince of Dragonstone 5/5 dead people
least likely to die -1.1306 * (normalized) title=Lord Commander of the Night's Watch 2/9 dead

2. Men vs Women: Women are more likely to survive.

3. Opponent Characters Below are the PLODs

Arya Stark and Cersei Lannister

Sansa Stark and Ramsay Bolton

Jon Snow Based on the plod, Jon Snow is likely to be resurrected :smile:

4. Misclassified Dead People
199 dead people have been misclassified as alive Below are some information about them Missing values 160 without date of birth/age 130 without culture 115 without titles

Common Features: 29 Night's watch 14 northmen 12 ironborn 11 Free Folk

Other Info: Among the 199 misclassified,

gyachdav commented 8 years ago

@marcusnovotny take a look. nice stats for your landing pages. hope this sparks new ideas.

gyachdav commented 8 years ago

Please also compile a list of the most dangerous houses and the "safest" houses. This ranking would be a simple average of PLODs for all characters grouped by houses.

ThuyNganTran commented 8 years ago

I got the average PLOD for the houses now but I still need to round and sort the PLOD values. It'll be up by tomorrow.

ThuyNganTran commented 8 years ago

Top Ten Most Dangerous Houses: House Lannister of Casterly Rock PLOD: 0.982 Blacks PLOD: 0.977 House Moore PLOD: 0.94 House Egen PLOD: 0.937 Good Masters PLOD: 0.926 House Cassel PLOD: 0.917 Brave Companions PLOD: 0.908 Khal PLOD: 0.904 House Cockshaw PLOD: 0.886 House Celtigar PLOD: 0.869

Top Ten 'Safest' Houses: House Humble PLOD: 0.007 House Merlyn PLOD: 0.006 Wise Masters PLOD: 0.004 House Codd PLOD: 0.004 House Myre PLOD: 0.003 House Farwynd PLOD: 0.002 House Stonetree PLOD: 0.002 House Tawney PLOD: 0.002 House Sparr PLOD: 0.002 House Goodbrother of Shatterstone PLOD: 0.002

gyachdav commented 8 years ago

is there a complete list I can look at? the top ten hardly contain any known names that we can use on the landing pages/.

ThuyNganTran commented 8 years ago

I can also just consider the pagerank of the house members when making up the top tens. Right now, I only executed the housePlod.js file in the 'stats' directory (branch housePLOD). I just committed the complete output here: https://github.com/Rostlab/JS16_ProjectB_Group6/commit/6131b9536fe9cdf10a46aaab74c8e19c233d4af3

gyachdav commented 8 years ago

even better yes! have a list flittered for popularity based on the aggregated pagerank for a house.

gyachdav commented 8 years ago

@marcusnovotny check out this list https://github.com/Rostlab/JS16_ProjectB_Group6/commit/6131b9536fe9cdf10a46aaab74c8e19c233d4af3

Seems like we have a headline: "House Lannister to become extinct by the end of Song of Ice and Fire"

marcusnovotny commented 8 years ago

Looks like they're in good company though :smile:

ThuyNganTran commented 8 years ago

@marcusnovotny I have to redo that top ten... it turns out that the naming of the houses are not consistent, so better not use it for the landing page yet. (e.g as for Lannisters, there is 'House Lannister', 'House Lannister of Lannisport' and 'House Lannister of Casterly Rock').

ThuyNganTran commented 8 years ago

Fixed it. You can find the complete list of houses with at least one popular character ranked by their average PLOD here: https://github.com/Rostlab/JS16_ProjectB_Group6/blob/288d52dbc91cd531d9714c18347f797161135b07/stats/sortedHousePlod.json

gyachdav commented 8 years ago

:cry: and I really wanted the Rains of Castamere to become the perfect literary device. The extinction of the Lannisters would have been such a great Ice and Fire moment and really in line with the way GRRM develops the plot throughout the series. Oh well, we still have House Poole to speculate about :laughing:

goldbergtatyana commented 8 years ago

hold on! those characters who are already dead - what is their PLOD? Is it 100%. If yes - :muscle:

If no - the statistics is messed up!

goldbergtatyana commented 8 years ago

@ThuyNganTran two suggestions on how to improve the statistics from above:

  1. all dead characters who died before reaching 100 years old get a PLOD = 100%
  2. please compile the statics according to the size of a house, i.e. there will be statistics for houses of the size of one member, then there will be statistics for houses of the size of two members, then a third separate statistics for the size of three members and so on. The reason for doing this is that we cannot compare tiny houses with large houses the way we did it above.

Please note - very important - consider only those characters as dead for your statistics who died a violent death, i.e. before reaching the age of 100 years old!

ThuyNganTran commented 8 years ago

I fixed the PLODs to 1 for dead characters who died before the age of 100. I also compiled a list for all houses and for only the popular houses (https://github.com/Rostlab/JS16_ProjectB_Group6/commit/e9d6013b71bde3f98fad73cac6543589551ec214). As for the size of the houses, I don't really have an idea of what you imagine it to be, since there should be a way to compare the houses to each other, despite their sizes.

goldbergtatyana commented 8 years ago

@ThuyNganTran can you please provide a link to the characters.csv file you are working with? Thanks

marcusnovotny commented 8 years ago

Hey @ThuyNganTran ! I'm still working on the pages and really interested in your data. Is there more than the sortedHousesPlod.json available?

I'm particularly interested in the following characters / houses: House Stark House Lannister Daenerys Targaryen Ramsay Bolton Theon Greyjoy Hodor

Thanks in advance.

ThuyNganTran commented 8 years ago

@goldbergtatyana here is the csv that I obtained from exporting an excel file containing all relevant character data: https://github.com/Rostlab/JS16_ProjectB_Group6/blob/housePLOD/stats/characters.csv @marcusnovotny I'll provide you with the following link: https://github.com/Rostlab/JS16_ProjectB_Group6/tree/housePLOD/stats It contains the ranking for all houses, only popular houses and also the csv file that contains the PLOD for every single character so you can just browse through to get the ones you need.

marcusnovotny commented 8 years ago

Thank you!

subburamr commented 8 years ago

@gyachdav @goldbergtatyana @sacdallago Just bringing to your notice that there could be inconsistencies for character PLODs displayed on webpage due to below reasons.

  1. Dead Characters misclassified as alive may have a plod < 100%. If PLOD is not applicable for dead, then this could be fixed by either A or F(when displaying the PLOD)
  2. Characters who have died in the TV show but not yet dead in the books(eg., Stannis Baratheon, Myrcella Baratheon, the mountain). Since our prediction is based on the data from books. These characters were treated as alive, but they have already died in the TV show.
goldbergtatyana commented 8 years ago

Ahh, my request was meant for group 7 - sorry @subburamr this is what happens when taking care of too many tasks at the same time. Everything is perfectly fine with your data :+1:

goldbergtatyana commented 8 years ago

@ThuyNganTran thanks Thuy for the nice work!

You probably noticed that we were surprised by important houses scoring so low. We took a look at the data and some things became more clear to us. For example that we should consider the size of a house for calling it dangerous - a house of 1 out 1 killed member is not more dangerous than a house where 9 of 10 members were killed.

Therefore, we want to ask you to please do the analysis of plod-calculation-per-house again and these should be your steps:

  1. sort the data by column house
  2. assign plod=1 to all dead characters of a house
  3. for all other characters (independent of age), get the plod from the corresponding column (as you did it before)
  4. make a sum of all plods of all characters of a house
  5. divide this sum by 105. 105 is the number of members of the largest house (night's watch) and so we'll normalize plods for all houses according to them. The result of sum/105 will then be the value we are looking for.
ThuyNganTran commented 8 years ago

@goldbergtatyana As far as I understood your instructions, I just replaced the #ofHouseMembers with 105 (number of largest house) when dividing the accumulated plod. The results for popular and all houses are under this commit https://github.com/Rostlab/JS16_ProjectB_Group6/commit/1700117e7e7739a80441d32f19fd06dce11f6cdc. I still don't really get how that normalized value will be representative for the average plod of the houses though... wouldn't maybe working with the variance (amount of spread of the values around the average plod of each house) work as well? I get that this approach might be too impracticable though.

goldbergtatyana commented 8 years ago

Still unhappy with the results, cause:

  1. tapping into sophisticated statistics is too heavy for the simple task at hand
  2. way too many characters have no age estimates.

Therefore @ThuyNganTran let's repeat the exercise one more time - hopefully the last one :) - please do the following:

  1. ignore the age for good now
  2. assign to everyone who is dead plod=1 and the actual plod to everyone else
  3. sum these numbers up and divide the sum by the number of members in a house

It is simple I know, but this is the best we can do right now!

For reference, I get the following numbers for following houses: Stark (22 dead + 50 alive) = 40.246/72 = 0.558972222 Lannister (18 dead + 33 alive) = 0.575568627 Bolton (4 dead + 9 alive) = 0.529153846 Frey = 0.385927835 Martell = 0.300689655 Greyjoy = 0.592219512 Arryn = 0.57375 Tully = 0.947916667

Thanks so much!!

goldbergtatyana commented 8 years ago

Hi @ThuyNganTran, is it all clear how to get to the PLOD numbers of houses?

ThuyNganTran commented 8 years ago

Hi @goldbergtatyana I'm sorry for the late reply but I had a tutoring seminar the last three days and my laptop's power supply had a literal meltdown yesterday... anyway, I hope that this https://github.com/Rostlab/JS16_ProjectB_Group6/commit/e1616a6e99120072a7c63faaf0609faeb4e5aab3 is what you wanted (json lists are updated too @marcusnovotny )!

goldbergtatyana commented 8 years ago

Thanks for keeping up with the good work, @ThuyNganTran! The results look good and you can expect seeing them on got.show soon :)