Milestone project for DSCI 522 (Data Science workflows); a course in the Master of Data Science program at the University of British Columbia.
This project aims to predict the mortality of characters in "Game of Thrones" using machine learning models. The model is designed to analyze data related to characters in the book series "Game of Thrones" and predict whether a character is likely to survive or die in the story line.
Fatality prediction in this context is a binary classification task, where the outcome is categorized as either target=0
(indicating survive) or target=1
(indicating dead). By training on the data, we aim to create a model capable of providing insights into the fate of characters based on various features.
character-predictions_pose.csv, Game of Thrones, Data Society, data.world
Our initial model comparison involved three classifiers: DummyClassifier, Logistic Regression (LR), and Support Vector Classifier (SVC). From the evaluation metrics, LR emerged as the top-performing model. Therefore, we proceeded hyperparameter optimization for LR to enhance its predictive capabilities.
The optimized LR model was then evaluated on a test set of 390 instances, resulting in the following key metrics:
These metrics provide a comprehensive understanding of the model's performance in predicting character survival or fatality. Overall the model's accuracy is fairly unimpressive, correctly predicting the fate of a character in only about half of all cases. While this might seem a bit disappointing, it did not particularly surprise or discourage us: George R. R. Martin is a celebrated author and master story teller, and the fact that we can't easily predict a whether a character will survive based on their attributes is a testament to the quality of his writing rather than the inadequacy of our model.
Further refinement and exploration of additional features may be necessary to enhance predictive accuracy. Future considerations might involve more sophisticated models to improve performance.
The final report as a jupyter book can be found here.
Docker is a container solution used to manage the software dependencies for this project. The Docker image used for this project is based on the quay.io/jupyter/minimal-notebook:2023-11-19. Additioanal dependencies are specified int the Dockerfile.
Setting up to run this analysis via docker
go to the root of this analysis project with a command prompt and enter the following command to clean up the project (removing all files generated by previous runs of the analysis):
docker compose run got_fatality_predictor bash -c "make -C ./work clean"
To run the analysis, enter the following command in the terminal in the project root:
docker compose run got_fatality_predictor bash -c "make -C ./work all"
use the following commands to develope this analysis on jupyter lab, run the following from the root of this repository:
docker compose up
Open your browser and type the following into the address bar:
localhost:8890
Open the directory
click on work/
To shut down the container and clean up the resources, type Cntrl + C in the terminal where you launched the container, and then type docker docker compose down
docker compose down
If you don't want to use docker, then the first time running the project, run the following from the root of this repository:
conda env create --file 522env.yaml -n GoT-fatality-prediction
Use the `GoT-fatality-prediction` environment and open the project with jupyter lab
conda activate GoT-fatality-prediction
jupyter lab