dmarx / video-killed-the-radio-star

Notebook and tools for end-to-end automation of music video production with generative AI
https://colab.research.google.com/github/dmarx/video-killed-the-radio-star/blob/main/Video_Killed_The_Radio_Star_Defusion.ipynb#scrollTo=oPbeyWtesAoh
MIT License
198 stars 35 forks source link

[feature] prompt2narrative #143

Open dmarx opened 1 year ago

dmarx commented 1 year ago
  1. user describes aspects of the video they want to see in natural language
  2. whisper parses transcript
  3. transcript + user prompt -> GPT + system prompt: "generate a storyboard..."
  4. new story board returned to user for intervention or start generation

can potentially include audio features? if using multimodal LLM api, could provide mel spectrogram(s)