Sharktrix / MyProjects

0 stars 0 forks source link

Movies and Ratings Analysis #1

Open Sharktrix opened 10 months ago

Sharktrix commented 10 months ago

Let's create a small project called "Movies and Ratings Analysis" using SQL, MySQL, and Python Language. This project aims to analyze movies and their ratings data.

Step 1: Setting up the Database

Create a MySQL database called "movie_database" and import a dataset like the "movies.csv" dataset. This dataset contains movie information like the title, genre, director, actors, year of release, etc.

Alternatively, you can create the tables using SQL queries:


CREATE DATABASE movie_database;
USE movie_database;

CREATE TABLE movies (
    id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    genre VARCHAR(255),
    director VARCHAR(255),
    actors VARCHAR(255),
    year INT
);

CREATE TABLE ratings (
    id INT AUTO_INCREMENT PRIMARY KEY,
    movie_id INT,
    rating INT,
    FOREIGN KEY (movie_id) REFERENCES movies(id)
);

Step 2: Loading Data into the Database

You can load data into the MySQL database using the LOAD DATA INFILE statement.

For the "movies" table:


LOAD DATA INFILE 'movies.csv'
INTO TABLE movies
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
For the "ratings" table:

sql
Download
Copy code
LOAD DATA INFILE 'ratings.csv'
INTO TABLE ratings
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;

Step 3: Performing SQL Queries using Python

Use the pymysql package to connect to the MySQL database and perform SQL queries.

Here's an example of how to fetch movie data and their ratings:


import pymysql

# Connect to the MySQL database
db = pymysql.connect("localhost", "root", "password", "movie_database")

# Create a cursor object
cursor = db.cursor()

# Execute the SQL query
sql = """
SELECT m.title, m.genre, m.director, m.actors, m.year, r.rating
FROM movies m
JOIN ratings r ON m.id = r.movie_id
"""
cursor.execute(sql)

# Fetch all the rows and print them
result = cursor.fetchall()
for row in result:
    print(row)

# Close the connection
db.close()

Step 4: Analyzing the Data

Use Python libraries like pandas, numpy, matplotlib, and seaborn to analyze the movie and ratings data.

For example, you can analyze the average rating of movies based on their genre:


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Convert the SQL result to a pandas DataFrame
df = pd.DataFrame(result, columns=['Title', 'Genre', 'Director', 'Actors', 'Year', 'Rating'])

# Calculate the average rating for each genre
average_ratings = df.groupby('Genre')['Rating'].mean()

# Create a bar plot to visualize the average ratings
plt.figure(figsize=(12, 6))
sns.barplot(x=average_ratings.index, y=average_ratings.values)
plt.title('Average Rating by Genre')
plt.xlabel('Genre')
plt.ylabel('Rating')
plt.show()

This is just a simple example of how you can analyze movies and ratings data using SQL, MySQL, and Python. You can perform various other data analysis tasks, such as exploring movie recommendations, predicting movie ratings, etc.