PublicDataWorks / verdad-frontend

MIT License
1 stars 0 forks source link

VER-208: [Backend] Implement Reliable Timestamped Transcription for Snippet Detection #180

Open nhphong opened 2 hours ago

nhphong commented 2 hours ago

As a developer, I want to redesign our audio processing pipeline to generate reliable timestamped transcriptions of disinformation snippets without relying on OpenAI Whisper, thereby reducing operational costs.

Tasks:

  1. Replace OpenAI Whisper with a cost-effective transcription solution for both the entire 30-minute audio file and its 5-minute segments.
  2. Segment a 30-minute audio file into six 5-minute segments for targeted transcription and analysis.
  3. Transcribe each segment and the entire audio file using only Gemini LLM.
  4. Develop a method to match snippet transcriptions with segment transcriptions to identify which segments contain disinformation snippets.
  5. Calculate precise timestamps for the start and end of snippets, narrowing down the time range using segment transcriptions.
  6. Implement further segmentation (e.g., 2-minute segments) to refine snippet timestamp accuracy if needed.

Acceptance Criteria:

Notes:

linear[bot] commented 2 hours ago

VER-208 [Backend] Implement Reliable Timestamped Transcription for Snippet Detection