VER-208: [Backend] Implement Reliable Timestamped Transcription for Snippet Detection

As a developer, I want to redesign our audio processing pipeline to generate reliable timestamped transcriptions of disinformation snippets without relying on OpenAI Whisper, thereby reducing operational costs.

Tasks:

Replace OpenAI Whisper with a cost-effective transcription solution for both the entire 30-minute audio file and its 5-minute segments.
Segment a 30-minute audio file into six 5-minute segments for targeted transcription and analysis.
Transcribe each segment and the entire audio file using only Gemini LLM.
Develop a method to match snippet transcriptions with segment transcriptions to identify which segments contain disinformation snippets.
Calculate precise timestamps for the start and end of snippets, narrowing down the time range using segment transcriptions.
Implement further segmentation (e.g., 2-minute segments) to refine snippet timestamp accuracy if needed.

Acceptance Criteria:

[ ] The pipeline no longer uses OpenAI Whisper and incorporates a more cost-effective transcription solution.
[ ] Transcriptions for the entire audio file and each 5-minute segment are accurately generated.
[ ] The system identifies segments containing disinformation snippets and calculates reliable start and end timestamps.
[ ] Further segmentation is possible to enhance timestamp precision, ensuring high reliability.
[ ] Documentation is updated to reflect changes in the transcription process and pipeline structure.

Notes:

Example: If a snippet matches transcriptions from segments 2, 3, and 4, the solution should provide reliable timestamps indicating the snippet's location from approximately 05:00 to 20:00.

PublicDataWorks / verdad-frontend

VER-208: [Backend] Implement Reliable Timestamped Transcription for Snippet Detection #180