Team 6 Milestone Project:

Game of Throne Characters Fatality Predictor

Author: Thomas Jian, Ian MacCarthy, Arturo Rey, Sifan Zhang

Milestone project for DSCI 522 (Data Science workflows); a course in the Master of Data Science program at the University of British Columbia.

🪐 Overview

This project aims to predict the mortality of characters in "Game of Thrones" using machine learning models. The model is designed to analyze data related to characters in the book series "Game of Thrones" and predict whether a character is likely to survive or die in the story line.

Fatality prediction in this context is a binary classification task, where the outcome is categorized as either target=0 (indicating survive) or target=1 (indicating dead). By training on the data, we aim to create a model capable of providing insights into the fate of characters based on various features.

📖 Data Source

character-predictions_pose.csv, Game of Thrones, Data Society, data.world

🧑‍💻 Model Comparison

Our initial model comparison involved three classifiers: DummyClassifier, Logistic Regression (LR), and Support Vector Classifier (SVC). From the evaluation metrics, LR emerged as the top-performing model. Therefore, we proceeded hyperparameter optimization for LR to enhance its predictive capabilities.

⭐️ Test Set Evaluation

The optimized LR model was then evaluated on a test set of 390 instances, resulting in the following key metrics:

Accuracy: 0.63
Precision, Recall, and F1-score for Each Class:
- Dead (Class 1): Precision 0.85, Recall 0.62, F1-score 0.72 (Support: 294)
- Survive (Class 0): Precision 0.38, Recall 0.68, F1-score 0.47 (Support: 96)
Macro Average: Precision 0.61, Recall 0.65, F1-score 0.59
Weighted Average: Precision 0.73, Recall 0.63, F1-score 0.66

📔 Test Summary

These metrics provide a comprehensive understanding of the model's performance in predicting character survival or fatality. Overall the model's accuracy is fairly unimpressive, correctly predicting the fate of a character in only about half of all cases. While this might seem a bit disappointing, it did not particularly surprise or discourage us: George R. R. Martin is a celebrated author and master story teller, and the fact that we can't easily predict a whether a character will survive based on their attributes is a testament to the quality of his writing rather than the inadequacy of our model.

🪜 Next Steps

Further refinement and exploration of additional features may be necessary to enhance predictive accuracy. Future considerations might involve more sophisticated models to improve performance.

🖨 Report

The final report as a jupyter book can be found here.

💻 Dependencies

Docker is a container solution used to manage the software dependencies for this project. The Docker image used for this project is based on the quay.io/jupyter/minimal-notebook:2023-11-19. Additioanal dependencies are specified int the Dockerfile.

📋 Usage

Setup

Setting up to run this analysis via docker

Install and launch Docker on your computer.
Clone this GitHub repository.

Running the analysis

go to the root of this analysis project with a command prompt and enter the following command to clean up the project (removing all files generated by previous runs of the analysis):

docker compose run got_fatality_predictor bash -c "make -C ./work clean"

To run the analysis, enter the following command in the terminal in the project root:

docker compose run got_fatality_predictor bash -c "make -C ./work all"

Continue analysis development

use the following commands to develope this analysis on jupyter lab, run the following from the root of this repository:

docker compose up

Open your browser and type the following into the address bar:

localhost:8890

Open the directory

click on work/

Clean up

To shut down the container and clean up the resources, type Cntrl + C in the terminal where you launched the container, and then type docker docker compose down

docker compose down

Setting up to run this analysis via conda environment.

If you don't want to use docker, then the first time running the project, run the following from the root of this repository:

conda env create --file 522env.yaml -n GoT-fatality-prediction

Use the `GoT-fatality-prediction` environment and open the project with jupyter lab

conda activate GoT-fatality-prediction
jupyter lab

📌 Reference

Data Society. 2016. Requests: Game of Thrones. https://data.world/data-society/game-of-thrones
Joel Östblom. 2023. DSCI531 Course Notes. https://pages.github.ubc.ca/MDS-2023-24/DSCI_531_viz-1_students/lectures/4-eda.html
Varada Kolhatkar. 2023. DSCI571 Course Notes . https://pages.github.ubc.ca/MDS-2023-24/DSCI_571_sup-learn-1_students/lectures/00_motivation-course-information.html
Joel Östblom. 2023. DSCI573 Course Notes. https://pages.github.ubc.ca/MDS-2023-24/DSCI_573_feat-model-select_students/README.html

UBC-MDS / GoT-fatality-prediction

readme