Enhancement: Integrate Open-Source LLM for Movie Information Retrieval
Description
Enhance the existing web crawler to utilize an open-source Language Model (LLM) to fetch and display detailed movie information based on user input. The information should include:
Movie summary
Reviews
Runtime
Reasons to watch
Provide an option for the user to choose which LLM they want to use for their search.
Tasks
Integrate Open-Source LLM API:
Use an open-source LLM like LLaMA or Mistral to fetch movie information.
Create a function to query the chosen LLM with the movie name and retrieve the required details.
Create User Input Interface:
Implement a simple terminal-based input for users to enter the movie name and choose the LLM.
Validate the input to ensure it is not empty.
Fetch Initial Data Using Web Crawler:
Use the existing web crawler to fetch initial data such as movie URLs, basic info, and reviews.
Pass this data as context to the LLM to enhance its response.
Fetch and Display Movie Information:
Use the chosen LLM to fetch movie summary, reviews, runtime, and reasons to watch.
Display the fetched information in a user-friendly format.
Surprise Enhancement: Movie Recommendations:
Use the LLM to generate a list of similar movies based on the user's input.
Display the recommended movies along with the fetched information.
Update requirements.txt:
Add the transformers library to requirements.txt.
Create README.md:
Add setup and run instructions to a new README.md file.
Implementation Details
File: src/movie_info.py
Function to Add: get_movie_info_from_llm(movie_name: str, initial_data: dict, llm_choice: str) -> dict
This function will query the chosen LLM with the movie name and initial data, and return a dictionary with keys: summary, reviews, runtime, reasons_to_watch, and recommendations.
Example Code:
import requests
from bs4 import BeautifulSoup
from transformers import pipeline
def get_initial_movie_data(movie_name: str) -> dict:
# Example function to fetch initial data using web crawler
url = f"https://www.rottentomatoes.com/search?search={movie_name}"
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
movie_url = soup.find('search-page-media-row').get('href')
movie_page = requests.get(f"https://www.rottentomatoes.com{movie_url}")
movie_soup = BeautifulSoup(movie_page.content, 'lxml')
summary = movie_soup.find('div', {'class': 'movie_synopsis'}).text.strip()
reviews = [review.text.strip() for review in movie_soup.find_all('blockquote')]
runtime = movie_soup.find('time').text.strip()
return {
'url': movie_url,
'summary': summary,
'reviews': reviews,
'runtime': runtime
}
def get_movie_info_from_llm(movie_name: str, initial_data: dict, llm_choice: str) -> dict:
if llm_choice == 'llama':
model_name = 'meta-llama/Meta-Llama-3.1-8B'
elif llm_choice == 'mistral':
model_name = 'mistralai/Mistral-Large-Instruct-2407'
else:
raise ValueError("Unsupported LLM choice")
generator = pipeline('text-generation', model=model_name)
prompt = (
f"Using the following initial data about the movie {movie_name}:\n"
f"Summary: {initial_data['summary']}\n"
f"Reviews: {initial_data['reviews']}\n"
f"Runtime: {initial_data['runtime']}\n"
"Provide a detailed summary, additional reviews, runtime, reasons to watch, and recommend similar movies."
)
response = generator(prompt, max_length=500)
return response[0]['generated_text']
if __name__ == "__main__":
movie_name = input("Enter the movie name: ").strip()
llm_choice = input("Enter the LLM to use (llama/mistral): ").strip().lower()
if movie_name and llm_choice:
initial_data = get_initial_movie_data(movie_name)
movie_info = get_movie_info_from_llm(movie_name, initial_data, llm_choice)
print(f"Summary: {movie_info['summary']}")
print(f"Reviews: {movie_info['reviews']}")
print(f"Runtime: {movie_info['runtime']}")
print(f"Reasons to Watch: {movie_info['reasons_to_watch']}")
print(f"Recommendations: {movie_info['recommendations']}")
else:
print("Please enter a valid movie name and LLM choice.")
Update requirements.txt:
beautifulsoup4
requests
lxml
transformers
Create README.md:
# Web Crawler with LLM Integration
This project is a web crawler that fetches movie information and enhances it using a Language Model (LLM) to provide detailed summaries, reviews, runtime, reasons to watch, and recommendations.
## Setup
1. Clone the repository:
```bash
git clone https://github.com/yourusername/web_crawler.git
cd web_crawler
Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install the dependencies:
pip install -r requirements.txt
Set up your LLM API key (if required):
For OpenAI's GPT-3, set the OPENAI_API_KEY environment variable.
Usage
Run the script:
python src/movie_info.py
Enter the movie name and choose the LLM (llama/mistral) when prompted.
Notes
Ensure to handle API errors and edge cases where the movie information might not be available.
Consider adding unit tests for the new functionality.
Enhancement: Integrate Open-Source LLM for Movie Information Retrieval
Description
Enhance the existing web crawler to utilize an open-source Language Model (LLM) to fetch and display detailed movie information based on user input. The information should include:
Provide an option for the user to choose which LLM they want to use for their search.
Tasks
Integrate Open-Source LLM API:
Create User Input Interface:
Fetch Initial Data Using Web Crawler:
Fetch and Display Movie Information:
Surprise Enhancement: Movie Recommendations:
Update
requirements.txt
:transformers
library torequirements.txt
.Create
README.md
:README.md
file.Implementation Details
File:
src/movie_info.py
Function to Add:
get_movie_info_from_llm(movie_name: str, initial_data: dict, llm_choice: str) -> dict
summary
,reviews
,runtime
,reasons_to_watch
, andrecommendations
.Example Code:
Update
requirements.txt
:Create
README.md
:Create a virtual environment:
Install the dependencies:
Set up your LLM API key (if required):
OPENAI_API_KEY
environment variable.Usage
Run the script:
Enter the movie name and choose the LLM (llama/mistral) when prompted.
Notes
References
Notes
References