PublicDataWorks / verdad-frontend

MIT License
1 stars 0 forks source link

VER-218 [Backend] Enhance Timestamped Transcript Generator for Scalability and Reliability #192

Closed nhphong closed 2 days ago

nhphong commented 2 days ago

We want to improve the timestamped transcript generator, so that the system can handle multiple transcription tasks concurrently, with better error handling and logging.

Acceptance Criteria:

  1. Memory Optimization:
    • Refactor the current implementation to reduce memory usage.
    • Ensure that the application can spawn multiple workers efficiently, allowing concurrent processing of several audio files without excessive memory consumption.
  2. Gemini Model Error Handling:
    • Implement a mechanism to minimize the "Service currently unavailable" error from the Gemini model.
    • Introduce a retry logic with exponential backoff for handling transient errors from the Gemini model.
  3. Timeout Management:
    • Add a timeout handler for requests to the Gemini model.
    • Implement fallback or retry strategies to manage timeouts gracefully.
  4. Logging Enhancements:
    • Increase the granularity and detail of logging throughout the transcription process.
    • Ensure that logs capture key events and errors to facilitate easier debugging and monitoring.
  5. Scalability Considerations:
    • Investigate and implement scaling strategies using fly.io to manage increased loads during the transcription step.
    • Ensure that the system can dynamically scale up additional machines as needed.

Tasks:

Additional Notes:

linear[bot] commented 2 days ago

VER-218 [Backend] Enhance Timestamped Transcript Generator for Scalability and Reliability