[128] - Refactoring of the E2E speech endpoint

Reviewer: @amiraliemami Estimate: 20 min

Ticket

Description

This PR implements a new end-to-end speech workflow design via the voice-search endpoint to make the . This change isolates speech functionality, making it more intuitive and easier to implement

Goal

To create a dedicated, isolated endpoint for the end-to-end speech workflow, featuring an optional generate_tts flag. This flag enables the generation of voice notes from the LLM response, streamlining speech functionality implementation and improving overall intuitiveness.

Changes

Introduced a new Pydantic model AudioResponse, inheriting from QueryResponse.
Implemented the voice-search endpoint, capable of processing voice files and optionally returning voice responses.
Created a generate_tts__after decorator, which requires an llm-response in the response object to execute.

How has this been tested?

dev environment docker-compose unit Tests

How to test this?

Configure the Speech_Api environment variables in template.core_backend.env.
Launch the Docker containers using: docker compose -f docker-compose.yml -f docker-compose.dev.yml -f docker-compose.speech.yml -p aaq-stack watch
Send a POST request to the voice-search endpoint.

To-do before merge (optional)

Once VoiceApi/GCP is merged, will merge this to main

Checklist

Fill with x for completed.

[x] My code follows the style guidelines of this project
[x] I have reviewed my own code to ensure good quality
[x] I have tested the functionality of my code to ensure it works as intended
[x] I have resolved merge conflicts
[x] I have updated the automated tests

IDinsight / ask-a-question