This PR integrates Google Cloud Storage (GCS) functionality into the end-to-end speech workflow, enabling cloud-based file upload and retrieval.
Goal
the Goal of this PR is To enhance the existing speech-to-speech workflow by leveraging cloud storage, eliminating the need for local disk operations and facilitating easier file management.
Changes
Implemented two new functions in core_backend/app/utils for GCS file upload and retrieval.
Integrated these functions into the stt-llm-response endpoint and generate_speech component.
How has this been tested?
dev environment
docker compose
Unit Tests
How to test this?
Set the appropriate GCS variables from template.core_backend.env (according to your gcs bucket name, by default we using "aaq-speech-test")
Send a POST request to thestt-llm-response endpoint.
If generate_tts is set to true in the QueryResponse Schema, a signed URL will be generated. Access this URL to play the generated MP3 file for text-to-speech output.
Checklist
Fill with x for completed.
[x] My code follows the style guidelines of this project
[x] I have reviewed my own code to ensure good quality
[x] I have tested the functionality of my code to ensure it works as intended
Reviewer: @markbotterill Estimate: 20 mins
Ticket
Fixes: JIRA_TICKET_LINK
Description
This PR integrates Google Cloud Storage (GCS) functionality into the
end-to-end speech workflow
, enabling cloud-based fileupload
andretrieval
.Goal
the Goal of this PR is To enhance the existing speech-to-speech workflow by leveraging cloud storage, eliminating the need for local disk operations and facilitating easier file management.
Changes
core_backend/app/utils
for GCS file upload and retrieval.stt-llm-response
endpoint andgenerate_speech
component.How has this been tested?
How to test this?
template.core_backend.env
(according to your gcs bucket name, by default we using "aaq-speech-test")POST
request to thestt-llm-response
endpoint.generate_tts
is set totrue
in the QueryResponse Schema, a signed URL will be generated. Access this URL to play the generated MP3 file for text-to-speech output.Checklist
Fill with
x
for completed.