Autoshow automates the processing of audio and video content from various sources, including YouTube videos, playlists, podcast RSS feeds, and local media files. It performs transcription, summarization, and chapter generation using different language models (LLMs) and transcription services.
The Autoshow workflow includes the following steps:
See docs/roadmap.md
for details about current development work and future potential capabilities.
scripts/setup.sh
checks to ensure a .env
file exists, Node dependencies are installed, and the whisper.cpp
repository is cloned and built. Run the script with the setup
script in package.json
.
npm run setup
Run on a single YouTube video.
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk"
Run on a YouTube playlist.
npm run as -- --playlist "https://www.youtube.com/playlist?list=PLCVnrVv4KhXPz0SoAVu8Rc1emAdGPbSbr"
Run on a list of arbitrary URLs.
npm run as -- --urls "content/example-urls.md"
Run on a local audio or video file.
npm run as -- --file "content/audio.mp3"
Run on a podcast RSS feed.
npm run as -- --rss "https://ajcwebdev.substack.com/feed"
Use local LLM.
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --ollama
Use 3rd party LLM providers.
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --chatgpt GPT_4o_MINI
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --claude CLAUDE_3_5_SONNET
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --gemini GEMINI_1_5_PRO
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --cohere COMMAND_R_PLUS
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --mistral MISTRAL_LARGE
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --fireworks
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --together
npm run as -- --video "https://www.youtube.com/watch?v=MORMZXEaONk" --groq
Example commands for all available CLI options can be found in docs/examples.md
.
Main Entry Point (src/autoshow.ts
)
Command Processors (src/commands
)
processVideo.ts
: Handles single YouTube video processingprocessPlaylist.ts
: Processes all videos in a YouTube playlistprocessURLs.ts
: Processes videos from a list of URLs in a fileprocessFile.ts
: Handles local audio/video file processingprocessRSS.ts
: Processes podcast RSS feedsUtility Functions (src/utils
)
downloadAudio.ts
: Downloads audio from YouTube videosrunTranscription.ts
: Manages the transcription processrunLLM.ts
: Handles LLM processing for summarization and chapter generationgenerateMarkdown.ts
: Creates initial markdown files with metadatacleanUpFiles.ts
: Removes temporary files after processingTranscription Services (src/transcription
)
whisper.ts
: Uses Whisper.cpp, openai-whisper, or whisper-diarization for transcriptiondeepgram.ts
: Integrates Deepgram transcription serviceassembly.ts
: Integrates AssemblyAI transcription serviceLanguage Models (src/llms
)
chatgpt.ts
: Integrates OpenAI's GPT modelsclaude.ts
: Integrates Anthropic's Claude modelsgemini.ts
: Integrates Google's Gemini modelscohere.ts
: Integrates Cohere's language modelsmistral.ts
: Integrates Mistral AI's language modelsfireworks.ts
: Integrates Fireworks's open source modelstogether.ts
: Integrates Together's open source modelsgroq.ts
: Integrates Groq's open source modelsprompt.ts
: Defines the prompt structure for summarization and chapter generationWeb Interface (web
) and Server (server
)