YolaKamalita / DAS-Group-10

Analyzing film attributes that can influence the rating through the application of Generalized Linear Model (GLM).
0 stars 1 forks source link

Analysis of IMDB Rating with GLM - Logistic Regression


In this analysis, the research question is to investigate which properties of films influence whether IMDB rating exceeding 7 or not. The Generalized Linear Model for binary response variables, Logistic Regression, will be used to investigate the relationship between binary rating (i.e., 1 if greater than 7 and 0 otherwise) and film properties.

Installation and Setup


R and quarto installation is required as the main tools for doing the statistical analysis and reporting. Here is the list of R packages which are needed to be installed in order to run the main .qmd file in the local machine:

install.packages(tidyverse)
install.packages(gt)
install.packages(skimr)
install.packages(knitr)
install.packages(corrplot)
install.packages(ggplot2)
install.packages(gridExtra)
install.packages(dplyr)
install.packages(stats)
install.packages(jtools)
install.packages(sjPlot)
install.packages(broom)
install.packages(huxtable)
install.packages(lmtest)
install.packages(zoo)

Data


IMDB rating dataset which is preprocessed by Data Analytics (Statistics) department from University of Glasgow as the playground dataset for learning purposes.

Film ID Year Length Budget Votes Genre Rating
49834 1963 107 11.4 225 Romance 3.1

Description:

Code structure


/DAS-Group-10
├── README.md
├── data/
│   └── dataset10.csv
├── pdf/
│   ├── Group_10_Presentation.pdf
│   ├── Group_10_qmd.pdf
│   └── raw/Group_10_Presentation.pptx
└── quarto/
    ├── R
    └── Group_10_Analysis.qmd

Results


In summary, budget and length have significant relationship to binary rating, whether greater than 7 or not. For Genre, two of them are not statistically significant: Animation and Romance.

Model selection steps

Future Work


Future studies should consider interaction terms between movie length and genre which provide insights into the nuanced effects these variables have on ratings.