AssemblyAI / assemblyai-ruby-sdk

The AssemblyAI Ruby SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, audio intelligence models, as well as the latest LeMUR models.
https://www.assemblyai.com
MIT License
7 stars 0 forks source link

Add support for Streaming STT (realtime) #46

Open Swimburger opened 4 months ago

Swimburger commented 4 months ago

Currently, we don't have plans to add support for Streaming STT. We will reconsider this decision if there is enough demand for Streaming STT in Ruby. Add a 👍 to this issue if you want Streaming STT support or leave a comment.

son1112 commented 3 months ago

How open are you to public contribution toward this effort?

Swimburger commented 2 months ago

@son1112, we'd be happy to accept it as a contribution. The only condition is that it should be clear that we don't officially support it, either via its name or some other way to mark it as experimental.

Michael9311 commented 1 month ago

Is anyone working on this?

I'm just going to dump my client here in case anyone wants it. I'm working with Twilio so it defaults to mulaw. Remember to buffer, unpack and pack your binary.

require "faye/websocket" require "eventmachine" require "json"

class AssemblyClient ASSEMBLYAI_API_URL = "https://api.assemblyai.com/v2/realtime/token" ASSEMBLYAI_WS_BASE_URL = "wss://api.assemblyai.com/v2/realtime/ws"

attr_reader :token, :ws_url, :websocket attr_accessor :on_transcript

def initialize(api_key: ENV["ASSEMBLY_API_KEY"], sample_rate: 8000, encoding: "pcm_mulaw", expires_in: 3600) @api_key = api_key @sample_rate = sample_rate @encoding = encoding @expires_in = expires_in @token = fetch_token @ws_url = build_ws_url @on_transcript = nil end

def fetch_token headers = { "authorization" => @api_key, "content-type" => "application/json" } body = {"expires_in" => @expires_in}.to_json response = HTTP.headers(headers).post(ASSEMBLYAI_API_URL, body: body) if response.status.success? JSON.parse(response.body.to_s)["token"] else raise "Error fetching token: #{response.body}" end end

def build_ws_url "#{ASSEMBLYAI_WS_BASE_URL}?sample_rate=#{@sample_rate}&encoding=#{@encoding}&token=#{@token}" end

def start EM.run do puts @ws_url @websocket = Faye::WebSocket::Client.new(@ws_url)

  @websocket.on :open do |_event|
    puts "Assembly WebSocket connection opened"
  end

  @websocket.on :message do |event|
    handle_message(event.data)
  end

  @websocket.on :error do |event|
    puts "Assembly WebSocket error: #{event.message}"
  end

  @websocket.on :close do |event|
    puts "Assembly WebSocket closed with code: #{event.code}, reason: #{event.reason}"
    EM.stop
  end
end

end

def send_audio(binary_audio_data) if @websocket && @websocket.ready_state == Faye::WebSocket::API::OPEN

binary_audio_data = Base64.decode64(audio_data).bytes

  @websocket.send(binary_audio_data.pack("C*").bytes)
else
  puts "Calling Assembly before Open"
  # raise "WebSocket is not open. Call start() before sending audio."
end

end

def stop if @websocket @websocket.send(JSON.generate({"terminate_session" => true})) @websocket.close end end

private

def handle_message(message) puts message data = JSON.parse(message) if ["PartialTranscript", "FinalTranscript"].include?(data["message_type"]) transcript = data["text"] if @on_transcript @on_transcript.call(transcript, data["message_type"]) else puts "Transcription (#{data["message_type"]}): #{transcript}" end end end end