The streaming platform BHO is constantly looking to improve the quality of its content and the satisfaction of its users. Our project consists of applying data analysis techniques to identify which are the most popular and best rated movies and short films from 2010 to date. This will help BHO make informed decisions about what content to promote and highlight on its platform.
Install Python:
Install Jupyter Notebook:
pip install notebook
Install Visual Studio Code:
Install Visual Studio Code Extensions:
Ctrl+Shift+X
.Install MySQL and MySQL Workbench:
Install Cinem Extract BDD
cinem_extract.sql
.Execute queries:
queries_cinema_extract.sql
file and run the SQL queries to get results.MoviesDataset API Make requests to this API and extract relevant information about the movies. Specifically, you will have to extract the following information:
Use the web automation tool Selenium to browse movie review websites, IMDB and Rotten Tomatoes, and extract:
Extract the following information from the top 10 actors of each of the movies extracted in phase 1 using Selenium (from the page and IMDB):
Work with the Beautiful Soup library to extract relevant information from the Oscar Awards tables since 2000:
Using SQL you must think about the structure that the database must have to store all the information collected and create all the tables and connections between them.
Insert all the data into the database designed in the previous step.
With the data stored in the database, you must perform SQL queries to retrieve specific information. The questions you must answer are:
Women in films is a fictitious company formed by three students from the Adalab Data Analysis Bootcamp who work together to carry out the CinemExtract project from Module 2 of the Bootcamp. Thank you for reading us and we hope you find our project useful!
To the ADALAB professors for the attention given throughout the project. To our classmates for their support and for sharing.