amazon-connect / amazon-connect-realtime-transcription

Transcribe Live Customer Audio for Amazon Connect using Amazon Kinesis Video Streams and Amazon Transcribe
MIT No Attribution
163 stars 99 forks source link

Amazon Connect IVR Recording

Making it easy to get started with Amazon Connect live audio streaming and real-time transcription using Amazon Transcribe.

On this Page

Project Overview

The purpose of this project is to provide an example solution to get you started with capturing and transcribing Amazon Connect audio using Kinesis Video Streams and Amazon Transcribe. The example Lambda functions can be used to create varying solutions such as capturing audio in the IVR and transcribing customer audio. To enable these different use-cases there are multiple environment variables environment variables and parameters in the invocation event that control the behavior of the Lambda Function.

Architecture Overview

Description

This solution can be configured to use the following services: Amazon Connect, Amazon Kinesis Video Streams, Amazon Transcribe, Amazon DynamoDB, AWS Lambda, and Amazon S3.

With Amazon Connect, customer audio can be live streamed to Kinesis Video Streams as described in this Amazon Connect documentation. This project serves as an example of how to consume an Amazon Connect live audio stream, capture the audio from each channel of the Connect audio stream, send it to S3, and combine the audio into a single file, as well as perform real-time transcription using Amazon Transcribe and posting those transcriptions to a DynamoDB table.

In the diagram above, once a call is connected to Amazon Connect:

The Lambda code expects the Kinesis Video Stream details provided by the Amazon Connect Contact Flow as well as the Amazon Connect Contact Id. The handler function of the Lambda is present in KVSTranscribeStreamingLambda.java and it uses the GetMedia API of Kinesis Video Stream to fetch the InputStream of the customer audio call. The InputStream is processed using the AWS Kinesis Video Streams provided Parser Library. If the transcriptionEnabled property is set to true on the input, a TranscribeStreamingRetryClient client is used to send audio bytes of the audio call to Transcribe. As the transcript segments are being returned, they are saved in a DynamoDB table having ContactId as the Partition key and StartTime of the segment as the Sort key. The audio bytes are also saved in a file along with this and at the end of the audio call, if the saveCallRecording property is set to true on the input, the WAV audio file is uploaded to S3 in the provided RECORDINGS_BUCKET_NAME bucket.

See the Amazon Transcribe streaming documentation for the latest supported languages.

Getting Started

Getting started with this project is easy. The most basic use case of capturing audio in the Amazon Connect IVR can be accomplished by downloading the pre-packaged Lambda Functions, deploying the CloudFormation template in your account, and importing the Contact Flows into your Amazon Connect Instance.

Easy Setup

Building the KVS Transcriber project

The lambda code is designed to be built with Gradle. All requisite dependencies are captured in the build.gradle file. Simply use gradle build to build the zip that can be deployed as an AWS Lambda application. After running gradle build, the updated zip file can be found in the build/distributions folder; copy it to the deployment folder then follow the Easy Setup steps above. Other files in the deployment folder are zip archives, each containing an individual file from the functions folder. The layer.zip file is produced by the Connect Audio Utils project; see the Audio Utils section below for further info.

Lambda Environment Variables

This Lambda Function has environment variables that control its behavior:

Sample Lambda Environment Variables

Lambda Invocation Event Details

This Lambda Function will need some details when invoked:

Sample Lambda Invocation Event

The following is a sample invocation event:

   { 
       "streamARN": "arn:aws:kinesisvideo:us-east-1:6137874xxxxx:stream/kvsstreams-connect-demo-6855eee9-fa47-4b84-a970-ac6dbdd30b9d/1542430xxxxxx",
       "startFragmentNum": "9134385233318150666908441974200077706515712xxxx",
       "connectContactId": "b0e14540-ca63-4205-b285-c6dde79bxxxx",
       "transcriptionEnabled": "true",
       "saveCallRecording": "true",
       "languageCode": "en-US",
       "streamAudioFromCustomer": "true",
       "streamAudioToCustomer": "true"
    }

Audio Utils

This solution uses the Connect Audio Utils project for combining audio files. For details on the building Connect Audio Utils see this link. Amazon Connect Audio Utils

License Summary

This sample code is made available under a modified MIT license. See the LICENSE file.