A recommender system is a type of information filtering system that provides personalized suggestions to users based on their preferences and behaviors. These systems are widely used to help users navigate through large amounts of data and find items that are most relevant to them. Recommender systems can be used in various sectors including:
These systems suggest products to customers based on their browsing history and past purchases. A good example is Amazon.
The systems recommend movies, TV shows, or music based on user preferences (e.g., Netflix, Spotify).
The systems curate content feeds and suggest friends or groups (e.g., Facebook, Instagram)
The systems display targeted ads based on user behavior and interests (e.g., Google Ads)
The system recommends articles that match user interests (e.g., personalized news feeds)
Suggests relevant research papers and collaborators (e.g., Google Scholar)
This project aims to develop a personalized movie recommender system. The system leverages cosine similarity techniques to provide movie suggestions tailored to individual user preferences and historical interactions. By analyzing user ratings and movie data, the recommender system can predict and suggest movies that users are likely to enjoy. Cosine similarity is a technique used in movie recommender systems to suggest films to users based on the similarity of their preferences to those of other users.
With an ever-growing library of movies available on various platforms, users often struggle to find content that matches their tastes. This project addresses the challenge by creating a personalized movie recommendation system. The goal is to enhance user experience by suggesting movies that they are likely to find appealing based on their past viewing history and the preferences of similar users. This project is also specific to movie streaming platforms that incorporate all the shows from various cable networks. It suggests movies and tv shows purely based on their similarity regardless of the network that airs the movie or the tv show.
Cosine similarity is a metric used to measure the similarity between two vectors by calculating the cosine of the angle between them. In this movie recommender system project, each movie can be represented as a vector of features, such as genres, actors, directors, or other characteristics.
Movies can be represented as vectors, where each feature (e.g., genres, cast, production companies, popularity) is a component. Cosine similarity allows us to measure how "close" two movie vectors are in this high-dimensional space.
Cosine similarity focuses on the direction of the vectors rather than their magnitude. This is crucial in cases where two movies might have very different absolute values (e.g., budget) but similar patterns across features. It helps when certain features vary greatly in scale, like revenue vs. vote average, without skewing the similarity score.
In a recommender system, the feature space can become very large (especially after encoding categorical variables), and cosine similarity is computationally efficient for comparing such high-dimensional vectors.
In collaborative filtering, user-item interaction data (like movie ratings) can be represented as vectors. Cosine similarity is often used to compute similarities between users (user-based collaborative filtering) or items (movie-based collaborative filtering), which is useful for recommending movies based on user preferences or similar movies.
Movie datasets are often sparse, meaning that many features (like cast or genre) don't apply to every movie. Cosine similarity handles this sparseness well by focusing on non-zero dimensions.
The first step is loading the data and understanding the dataset which involves reviewing the dataset's columns, types, and summary statistics to gain insights into the data.
These are the most important features or columns that provide valuable information for analysis and model building.
EDA involves visually and statistically exploring a dataset to uncover patterns, trends, and relationships.
Data preprocessing involves cleaning, transforming, and organizing raw data into a format that can be effectively used for analysis and model building. Some of the steps include checking for missing values, encoding, feature engineering among others.
This is a technique used to convert categorical variables into a numerical format that machine learning models can understand. This way, machine learning models can interpret categorical data without assuming any ordinal relationship between categories.
Scaling numeric features is necessary to ensure they are on the same scale.
Cosine similarity is a metric used to measure how similar two vectors are, regardless of their magnitude. It calculates the cosine of the angle between two vectors, and the value ranges from -1 to 1
The recommendation part of the model involves calculating the similarity between movies based on certain features (such as genres, cast, crew, production companies, budget, and revenue) using cosine similarity. This process enables the system to suggest movies that are closely related in terms of content or characteristics, based on the selected features and the weights applied to each.
The next phase of my project involves deploying the movie recommender system using Streamlit to create an interactive and user-friendly interface. Streamlit will allow users to input their preferences and view personalized movie recommendations in real-time.