charisma-ai / charisma-sdk-js

Charisma.ai SDK for Javascript (browser)
MIT License
10 stars 3 forks source link

add to docs about the differences between microphone.startListening() and playthrough.startSpeechRecognition() #41

Closed John-A-J closed 2 months ago

John-A-J commented 3 months ago

There are two speech recognition systems available in the Charisma JS SDK.

  1. microphone.startListening() uses the built-in browser SpeechRecognition. It’s free, but it’s not available in every browser (such as Firefox). This is what the https://charisma.ai/ website editor uses in the chat tester/‘play’ page.
  2. playthrough.startSpeechRecognition() is our premium alternative which leverages Deepgram (or AWS/Google) under the hood. It works in every browser and uses credits and the quality should be significantly superior.

Regarding method 2), we recently shipped a fix in @charisma-ai/sdk version 4.0.3 which resolves an issue with using an incorrect sample rate, leading to odd/garbled/no results in certain browsers.

Could I check you’re using method 2) and you’re on the latest version of the SDK (4.0.3)? If so, perhaps you could try either the AWS or Google engines by providing service: “unified:aws” or service: “unified:google” in the startSpeechRecognition options, and see if that works any better?

There is a message called speech-recognition-started which we send to the client when the connection to the STT service has been initialised successfully. You can use onSpeechRecognitionStarted on a Playthrough to react to it. You can also see messages in chrome dev tools if you inspect the WebSocket messages.

The STT custom options are described in more detail here so there might be options that can be set for better compatibility with different browsers.

and add a further ticket to address this sort of stuff in the react sdk, which might be done slightly differently