movie_recommender.py This script uses pandas for data handling and sklearn for machine learning algorithms. We’ll implement a simple approach based on cosine similarity, an efficient algorithm to find movies similar to those you’ve liked in the past. import pandas as pd from sklearn.metrics.pairwise import cosine_similarity from sklearn.feature_extraction.text import TfidfVectorizer

1. Load the movie dataset

def load_movie_data(file_path): movies = pd.read_csv(file_path) print("Movie data successfully loaded.") return movies

2. Prepare data using TF-IDF on genres and descriptions

def prepare_data(movies): tfidf = TfidfVectorizer(stop_words="english") movies["description"] = movies["description"].fillna("") tfidf_matrix = tfidf.fit_transform(movies["description"] + " " + movies["genres"]) print("Data preparation complete.") return tfidf_matrix

3. Learning from user history

def learn_from_user_history(user_history, movies, tfidf_matrix): liked_movie_indices = movies[movies["title"].isin(user_history)].index user_profile = tfidf_matrix[liked_movie_indices].mean(axis=0) print("User profile created based on past preferences.") return user_profile

4. Recommending movies

def recommend_movies(user_profile, movies, tfidf_matrix, top_n=10): similarity_scores = cosine_similarity(user_profile, tfidf_matrix) similar_movies_indices = similarity_scores.argsort().flatten()[-top_n:] recommendations = movies.iloc[similar_movies_indices][::-1] # Reverse order for descending scores return recommendations[["title", "genres", "description"]]

5. Main function

def main(file_path, user_history): movies = load_movie_data(file_path) tfidf_matrix = prepare_data(movies) user_profile = learn_from_user_history(user_history, movies, tfidf_matrix) recommendations = recommend_movies(user_profile, movies, tfidf_matrix)

print("\nRecommended Movies for You:")
for index, row in recommendations.iterrows():
    print(f"Title: {row['title']}\nGenres: {row['genres']}\nDescription: {row['description']}\n")

Example script execution

file_path = "path_to_movie_database.csv" # Replace with your movie database path user_history = ["Movie A", "Movie B", "Movie C"] # Replace with a list of movies you've liked main(file_path, user_history)load_movie_data: This function loads movie data from a CSV file. The CSV file should contain columns like title, genres, and description.

prepare_data: This function uses the TF-IDF (Term Frequency-Inverse Document Frequency) technique to create a vectorized model based on each movie's description and genres. This prepares the data for similarity analysis, which is key to determining which movies are similar to your past favorites.

learn_from_user_history: This function builds a "user profile" based on your past movie preferences. It selects the movies you have liked, calculates their average TF-IDF vector, and creates a composite profile that represents your tastes.

recommend_movies: Using cosine similarity, this function calculates the similarity between your user profile and each movie in the database. It then returns a specified number of top movie recommendations (default is 10), sorted by how closely they match your profile.

main: The main function integrates all the previous functions and displays the final movie recommendations. It accepts a file path to your movie dataset and a list of movies you liked in the past.

Instructions for Running the Script Prepare a CSV file with movie data, including columns for title, genres, and description. Save the script as movie_recommender.py and load it onto GitHub. In the README.md, provide instructions on how to use the script, including details for setting user_history. Run the script, test it with different sets of past favorite movies, and adjust the algorithm as needed to improve recommendations.

DhanushNehru / Python-Scripts

Movie Recommendation System #358