The front end will deposit uploads to a specified GCS bucket. This task is to build a pipeline that can take these uploads, and pass them to the ASR.
To create a Google Cloud Function that watches a bucket for new files and sends them to a server using the provided request URL, you'll need to write a function that triggers on google.storage.object.finalize, which is invoked when a new file is uploaded to a Google Cloud Storage bucket. The function will then send a POST request to your specified server with the file content.
Additionally, save the JSON response to another bucket. Below is the updated function that includes handling the response:
Here's an example implementation in Python:
import os
import json
import requests
from google.cloud import storage
def send_file_to_server_and_save_response(event, context):
"""Triggered by a new file uploaded to a specified Google Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
file_name = event['name']
bucket_name = event['bucket']
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(file_name)
# Downloading the file to a temporary location
temp_file_path = f"/tmp/{file_name}"
blob.download_to_filename(temp_file_path)
url = "http://35.245.63.100:9000/asr?task=transcribe&encode=true&output=json&diarize=true"
files = {'audio_file': (file_name, open(temp_file_path, 'rb'), 'video/webm')}
response = requests.post(url, files=files)
if response.status_code == 200:
print(f"File {file_name} was successfully sent to the server.")
response_data = response.content
# Define the bucket to save the JSON response
response_bucket_name = '<YOUR_RESPONSE_BUCKET>'
response_bucket = storage_client.bucket(response_bucket_name)
# Define the path and name for the response file
response_file_name = file_name + '.json'
response_blob = response_bucket.blob(response_file_name)
# Upload the JSON response to the specified bucket
response_blob.upload_from_string(response_data, content_type='application/json')
print(f"Response JSON for {file_name} saved to bucket {response_bucket_name} as {response_file_name}.")
else:
print(f"Failed to send file {file_name} to the server.")
# Clean up the temporary file
os.remove(temp_file_path)
To deploy this function, follow these steps:
Ensure you have the Google Cloud SDK installed and initialized.
Create a requirements.txt file with the following contents to specify the dependencies:
google-cloud-storage
requests
Deploy the function to Google Cloud Functions with the following command, replacing <YOUR_TRIGGER_BUCKET>, <YOUR_RESPONSE_BUCKET> with the name of your Google Cloud Storage bucket:
Make sure to replace <YOUR_FUNCTION_REGION> with the region where you want to deploy your Cloud Function.
Test the function by uploading a file to your specified bucket and checking the logs for successful execution.
Remember to ensure that the Cloud Function has the necessary IAM permissions to access the Google Cloud Storage buckets and the internet if it's running in a VPC-scoped environment.
Additionally, ensure that the Cloud Function's service account has the Storage Object Creator role (or a custom role with equivalent permissions) on the response bucket to allow it to write the response files.
Current code bypasses the google bucket and just interacts with the ASR server directly. We decided to try doing all the workflow stuff on the client end instead.
The front end will deposit uploads to a specified GCS bucket. This task is to build a pipeline that can take these uploads, and pass them to the ASR.
To create a Google Cloud Function that watches a bucket for new files and sends them to a server using the provided request URL, you'll need to write a function that triggers on
google.storage.object.finalize
, which is invoked when a new file is uploaded to a Google Cloud Storage bucket. The function will then send a POST request to your specified server with the file content.Additionally, save the JSON response to another bucket. Below is the updated function that includes handling the response:
Here's an example implementation in Python:
To deploy this function, follow these steps:
Ensure you have the Google Cloud SDK installed and initialized.
Create a
requirements.txt
file with the following contents to specify the dependencies:Deploy the function to Google Cloud Functions with the following command, replacing
<YOUR_TRIGGER_BUCKET>
,<YOUR_RESPONSE_BUCKET>
with the name of your Google Cloud Storage bucket:Make sure to replace
<YOUR_FUNCTION_REGION>
with the region where you want to deploy your Cloud Function.Test the function by uploading a file to your specified bucket and checking the logs for successful execution.
Remember to ensure that the Cloud Function has the necessary IAM permissions to access the Google Cloud Storage buckets and the internet if it's running in a VPC-scoped environment. Additionally, ensure that the Cloud Function's service account has the
Storage Object Creator
role (or a custom role with equivalent permissions) on the response bucket to allow it to write the response files.