asterisk / asterisk-external-media

Apache License 2.0
64 stars 27 forks source link
speech-recognition

Asterisk External Media Sample

This package demonstrates how to use the ARI External Media feature to transcribe the audio from a bridge using the Google Speech APIs.

Installation

Prerequisites

Run npm install from the top of the source tree. This will install the required npm packages including node-ari-client and @google-cloud/speech. You can then run the transcriber as bin/ari-transcriber. If you add the -g option to npm install to install system wide, you can just run ari-transcriber.

Usage

$ ari-transcriber --help
ari-transcriber [options] <dialstring>

Start the transcription server

Positionals:
  dialstring  Extension to dial such as "Local/1234"

Incoming Audio Server
  --format, -f        Asterisk format/codec                                         [string] [choices: "ulaw", "slin16"] [default: "ulaw"]
  --listenServer, -l  Address and port on which to listen for audio from Asterisk                     [string] [default: "127.0.0.1:9999"]

Speech
  --speechModel         Google Speech API model                  [string] [choices: "phone_call", "video", "default"] [default: "default"]
  --speechLang          BCP-47 Language code.  en-US, fr-CA, etc.                  [string] [choices: "en-US", "fr-CA"] [default: "en-US"]
  --speakerDiarization  Outputs words associated to speaker index to the console.                               [boolean] [default: false]

ARI
  --ariServerUrl, -a  The URL for the Asterisk instance                                        [string] [default: "http://127.0.0.1:8088"]
  --ariUser, -u       The user configured in ari.conf                                                       [string] [default: "asterisk"]
  --ariPassword, -p   The password for the user configured in ari.conf                                      [string] [default: "asterisk"]

Transcription WebSocket
  --sslCert  WebSocket secure server (wss) certificate. If omitted, a non-secures (ws) websocket server will be created           [string]
  --sslKey   WebSocket secure server key. Must be supplied if sslCert is ssplied                                                  [string]
  --wssPort  WebSocket server port.  If omitted, no websocket server will be started                                              [number]

Options:
  --version          Show version number                                                                                         [boolean]
  --help             Show help                                                                                                   [boolean]
  --audioOutput, -o  A file into which the raw audio from Asterisk can be written                                                 [string]

Examples:
  ari-transcriber --format=slin16 --sslCert=/etc/letsencrypt/live/myserver/fullchain.pem
  --sslKey=/etc/letsencrypt/live/myserver/privkey.pem --wssPort=39990 --speakerDiarization 'Local/1234'

The ari-transcriber performs several tasks:

Why the Local channel and bridge? To keep the sample as simple as possible, it's assumes that a conference bridge is already available. Since that bridge wouldn't be controlled by ARI/Stasis, we can't just add the External Media channel directly to it. Instead we have to create a Local channel that dials the conference bridge, then bridge that channel with the External Media channel. If you were ading External Media capabilities to your own application, chances are that your app would already control the participant bridge and you wouldn't need the Local channel and mixing bridge.

Try It!

You don't need the WebSocket transcription server to try this. Just a phone to call.

$ export GOOGLE_APPLICATION_CREDENTIALS=<path to Google API credentials>
$ ari-transcriber --format=slin16 'Local/1234'