[Backend] Generate Transcription for All Recorded Audio Clips

We want to re-implement Stage 1 of the audio processing pipeline using the new approach,

So that we can improve the accuracy of disinformation detection and ensure audit capability by retaining transcripts of all audio clips.

Acceptance Criteria:

Transcription:
- Use the Whisper API to transcribe every audio clip.
- Ensure that the generated transcription is timestamped.
Disinformation Detection:
- Feed the timestamped transcription into Gemini Flash for potential disinformation detection.
- Produce the Stage 1 output based on the detection results.
Audit Capability:
- Store the raw text of all transcriptions to allow for full audit capability.
- Ensure that transcripts of all audio clips, including those that do not contain disinformation, are retained.
Cost Consideration:
- Implement the new approach without initially worrying about the increased costs.
- Document the cost implications and prepare to optimize if budget constraints arise in the future.
Documentation and Testing:
- Update any relevant documentation to reflect the new approach.
- Conduct tests to verify the accuracy and reliability of the new process.

Notes:

We might later implement a preliminary filtering step using Gemini Flash to reduce processing costs, but this is not a current priority.
Ensure that the system can handle large volumes of transcription data efficiently.

Tasks:

Integrate Whisper API for transcription of all audio clips.
Modify the existing pipeline to process transcriptions through Gemini Flash.
Implement storage for complete transcription logs.
Conduct testing to ensure the accuracy of disinformation detection with the new method.
Document the new pipeline process and update any user guides or technical documentation.

PublicDataWorks / verdad-frontend