This project implements an automated workflow for creating engaging podcasts from academic texts using AI-powered agents. The system takes a PDF file as input, processes its content, and generates an audio podcast with playful banter between a host and a guest. It also includes a self-improving mechanism that optimizes the prompts used in the podcast creation process based on user feedback.
Podcast Creation (src/paudio.py)
Feedback Collection and Prompt Optimization (src/paudiowithfeedback.py, src/utils/textGDwithWeightClipping.py)
Continuous Improvement Cycle
Simulation and Evaluation (src/simulation.py, src/evaluation.py)
Web Interface (frontend/, backend/fast_api_app.py)
This integrated system creates a feedback loop where each podcast generation, user interaction, and optimization cycle contributes to improving the overall quality of the AI-generated podcasts. The use of timestamps throughout the process ensures version control and allows for detailed analysis of the system's evolution over time.
Create and activate a Conda environment:
conda create -n podcast python=3.12
conda activate podcast
conda install pip
or run as one single command:
conda create -n podcast python=3.12 -y && conda activate podcast && conda install pip -y
Install uvicorn:
pip install uvicorn
Install required Python packages:
pip install -r requirements.txt
Install Rust (Cargo required to install some packages):
If you received this error from previous step:
Cargo, the Rust package manager, is not installed or is not on PATH.
That means the package (caused the error) requires Rust and Cargo to compile extensions. Install it through the system's package manager, via https://rustup.rs or follow the instructions at https://www.rust-lang.org/tools/install
Make sure you follow instructions carefully for your preferred shell (bash, zsh, etc.) to add Rust to PATH and activating Cargo.
Set up OpenAI API key:
Obtain your OpenAI API key from https://platform.openai.com/api-keys
set OPENAI_API_KEY=your_openai_api_key_here
export OPENAI_API_KEY=your_openai_api_key_here
sample.env
to .env
file in the project root
copy sample.env .env
cp sample.env .env
OPENAI_API_KEY=your_api_key_here
Install Node.js and npm: Node.js and npm are required for the frontend. Here's how to install them on different operating systems:
brew install node
For Ubuntu or Debian-based distributions:
```
sudo apt update
sudo apt install nodejs npm
```
For other distributions, refer to your package manager or the official Node.js documentation.
Verify the installation by running:
```
node --version
npm --version
```
Install frontend dependencies:
cd frontend
npm install
Timestamps are used in this project to version control the prompts used by the AI agents. Each time the system generates a podcast and receives feedback, it optimizes the prompts and saves them with a new timestamp. This allows the system to track the evolution of prompts over time and use the most recent or specific versions when creating new podcasts.
Generate a Podcast:
python src/paudio.py <path_to_pdf_file> [--timestamp YYYYMMDD_HHMMSS]
Options:
<path_to_pdf_file>
: Path to the PDF file you want to convert into a podcast.--timestamp YYYYMMDD_HHMMSS
: (Optional) Use prompts from a specific timestamp. If not provided, it uses the most recent prompts.--timestamp last
: Use the most recent timestamp (same as not providing a timestamp).Examples:
python src/paudio.py path/to/your/file.pdf
python src/paudio.py path/to/your/file.pdf --timestamp 20230615_120000
python src/paudio.py path/to/your/file.pdf --timestamp last
Generate a Podcast with Feedback:
python src/paudiowithfeedback.py <path_to_pdf_file> [--timestamp YYYYMMDD_HHMMSS]
This script creates a podcast and allows you to provide feedback, which is then used to optimize the prompts.
Options:
<path_to_pdf_file>
: Path to the PDF file you want to convert into a podcast.--timestamp YYYYMMDD_HHMMSS
: (Optional) Use prompts from a specific timestamp.--timestamp last
: Use the most recent timestamp (default behavior if no timestamp is provided).The script will:
How it works with timestamps:
Examples:
python src/paudiowithfeedback.py path/to/your/file.pdf
python src/paudiowithfeedback.py path/to/your/file.pdf --timestamp 20230615_120000
python src/paudiowithfeedback.py path/to/your/file.pdf --timestamp last
Run Self-Improving Simulation:
python src/simulation.py
This script runs a simulation of the podcast creation and prompt optimization process:
arxiv_papers
folder in the project root directory.Note: Before running the simulation, make sure to add PDF files to the arxiv_papers
folder.
Evaluate Self-Improvement Process:
python src/evaluation.py
This script evaluates the quality of generated podcasts over time:
arxiv_papers
folder.Start the Web Interface:
uvicorn fast_api_app:app --reload
cd frontend
npm install
npm start
http://localhost:3000
src/paudio.py
: Main script for podcast creationsrc/paudiowithfeedback.py
: Script for podcast creation with feedback collectionsrc/utils/textGDwithWeightClipping.py
: Prompt optimization scriptsrc/simulation.py
: Simulation of the self-improvement processsrc/evaluation.py
: Evaluation script for generated podcastsbackend/fast_api_app.py
: FastAPI backend applicationfrontend/
: React-based frontend applicationrequirements.txt
: List of Python dependenciesThis project uses OpenAI's GPT models, which require an API key and may incur costs. Ensure you have appropriate credits or billing set up with OpenAI.
For detailed information on setup, usage, and the self-improvement mechanism, please refer to the sections below.
You can try this AI-powered podcast creation tool for free at https://www.metaskepsis.com/. Experience the power of AI-generated podcasts and see how this system can transform academic texts into engaging audio content.
This project draws inspiration from TextGrad, a novel approach to optimization in natural language processing introduced by Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, and James Zou.
An additional feature implemented here is a "weight clipper" concept, which draws an interesting parallel to gradient clipping in traditional stochastic gradient descent (SGD). In SGD, gradient clipping prevents exploding gradients and ensures stable training. Similarly, in this TextGrad-inspired implementation, the weight clipper constrains modifications to prompts or other textual elements during the optimization process. This helps maintain coherence, prevents drastic changes to the text, and keeps textual modifications meaningful and aligned with the original intent. This approach adapts optimization concepts to the unique challenges of working with natural language, bridging the gap between traditional machine learning techniques and language model optimization.
Another challenge addressed in this project is applying gradients to a chain of agents in LangGraph. The solution implemented here uses a role-specific loss function for each agent, while providing the same final feedback to all agents. This approach allows each agent to determine independently what changes to make based on their specific role and the overall feedback. It's worth noting that feedback allocation is a distinct and complex problem in reinforcement learning, and there's no one-size-fits-all solution. By implementing this method, the system attempts to optimize the entire chain of agents while respecting their individual functions within the larger process, though it's an area that likely warrants further exploration and refinement.
We welcome collaboration on this project, especially in the areas of self-improving prompts and local Text-to-Speech (TTS) solutions. If you're interested in contributing:
Self-Improving Prompts: We're particularly keen on enhancing our prompt optimization techniques. If you have experience with natural language processing, machine learning, or prompt engineering, your insights could be invaluable in refining our self-improvement algorithms.
Local TTS Solutions: While our current system uses cloud-based TTS, we're exploring local TTS options to enhance privacy and reduce dependency on external services. If you have expertise in TTS technologies or experience with open-source TTS libraries, we'd love your input on integrating these into our workflow.
General Improvements: Whether it's optimizing our code, enhancing the user interface, or expanding the system's capabilities, all contributions are welcome.
To get involved, please check our GitHub repository, open an issue to discuss your ideas, or submit a pull request with your proposed changes. Let's work together to push the boundaries of AI-powered podcast creation and make this tool even more powerful and accessible!