PublicDataWorks / verdad-frontend

MIT License
1 stars 0 forks source link

VER-210: [Backend] Replace OpenAI Whisper with Gemini Pro for Transcription Phase in Audio Processing Pipeline #182

Open nhphong opened 1 day ago

nhphong commented 1 day ago

As a developer, I want to replace the OpenAI Whisper component with Gemini Pro for the transcription phase in our audio processing pipeline so that we can reduce costs and maintain transcription reliability.

Acceptance Criteria:

  1. Integration with Existing Pipeline:
    • Replace the OpenAI Whisper component with the Gemini Pro transcription solution.
    • Ensure seamless integration without disrupting the current workflow.
  2. Functionality Verification:
    • Verify that the new transcription method provides timestamped transcriptions with accuracy comparable to or better than OpenAI Whisper.
    • Ensure the pipeline can handle 30-minute audio files using the selected approach (split audio or single prompt with reduced accuracy).
  3. Cost Analysis:
    • Analyze and document cost savings achieved by switching to Gemini Pro.
    • Ensure that the new solution aligns with budgetary goals.
  4. Testing and Quality Assurance:
    • Conduct comprehensive testing to ensure the accuracy and reliability of transcriptions in the updated pipeline.
    • Validate that all pipeline components interact correctly with the new transcription method.
  5. Documentation and Training:
    • Update system documentation to reflect the changes made in the transcription process.
    • Provide training or guidance to the team on the new transcription integration.

Tasks:

Notes:

linear[bot] commented 1 day ago

VER-210 [Backend] Replace OpenAI Whisper with Gemini Pro for Transcription Phase in Audio Processing Pipeline