anycable / anycable-twilio-hanami-demo

Using AnyCable and Hanami to build an app to process Twilio Media streams
12 stars 0 forks source link

Is there any way of using two way communication on any cable. #1

Open zamananjum0 opened 10 months ago

zamananjum0 commented 10 months ago

I need help on bidirectional chat system where we can use twilio, and AI agent can communicate with the caller.

junket commented 2 weeks ago

@zamananjum0 Did you figure this out? Looking at the AnyCable code, this demo utilizes a custom executor to handle incoming messages on the WS but doesn't appear to have a similar model in place for some custom handler to send messages back via WS. Am I wrong?

junket commented 2 weeks ago

@palkan It would be amazing to get your input on if/how this can be accomplished. Thank you for the excellent demo!

palkan commented 2 weeks ago

Hey there,

Sorry, I didn't have notifications tuned on for this repo, so missed this one.

There are two ways you can implement bi-directional communication:

1) Move it to the Go app. This would be more performant but require more work and maintenance at the Go app side. The entry-point for adding responses is somewhere around here: https://github.com/anycable/anycable-twilio-hanami-demo/blob/d9dcf083971ed67a89d1423b5765d26ba68bfa10/twilio-cable/pkg/twilio/executor.go#L155 (and you can use session.Send(msg) to send data back to the session).

2) Use AnyCable pub/sub capabilities. You can subscribe your session to some named stream here (e..g., stream_from "twilio/#{call_sid}") and then use ActionCable.server.broadcast (or AnyCable.broadcast) to send data back to this stream. A bit of overhead for pub/sub but all the logic lives in your Ruby/Rails app.

junket commented 2 weeks ago

That's great, @palkan. Thank you!

For my project, I'm inclined to keep the business logic in Rails as long as possible, so option #2 here would be great. But I found that if I simply broadcast my data from Rails in the form of a media_event message (the JSON format Twilio expects), AnyCable does not appear to publish that message to the the Twilio media stream (i.e. the client in this case).

The logs suggest that the broadcast was scheduled, but no outbound audio is played in the stream and I'm not sure how to "see" what the message relayed by AnyCable looked like. I get:

2024-06-15 12:09:40.167 DBG handle broadcast message nodeid=oDo8m8 context=node payload.stream=call_CAXXXXXXX payload.data="\"{\\\"event\\\":\\\"media\\\",\\\"sequenceNumber\\\":\\\"149\\\",\\\"media\\\":{\\\"payload\\\": \\\"dHd5en1//v39/fz8/f3+////f...(204)"
2024-06-15 12:09:40.167 DBG incoming broadcast message nodeid=oDo8m8 context=node payload.stream=call_CAXXXXXXX payload.data="\"{\\\"event\\\":\\\"media\\\",\\\"sequenceNumber\\\":\\\"149\\\",\\\"media\\\":{\\\"payload\\\": \\\"dHd5en1//v39/fz8/f3+////f...(204)"
2024-06-15 12:09:40.167 DBG schedule broadcast nodeid=oDo8m8 context=node component=hub gate=3 stream=call_CAXXXXXXX message.stream=call_CAXXXXXXX message.data="\"{\\\"event\\\":\\\"media\\\",\\\"sequenceNumber\\\":\\\"149\\\",\\\"media\\\":{\\\"payload\\\": \\\"dHd5en1//v39/fz8/f3+////f...(204)"

I wonder if this is because we need our custom encoder to do something with the message payload from Rails? Or perhaps we need to specify a BroadcastType to control how it is delivered to the socket? Any hints--even just a pointer on how to log what AnyCable is broadcasting to the client--would help tremendously 🙏

junket commented 2 weeks ago

Let me answer my own question: This example app actually contains everything you need for bi-directional communication with Twilio media streams. I just needed to learn some Go. Thank you, @palkan and @irinanazarova!

The encoder's Encode method is called on the reply per the in-code comment "Encoder converts messages from/to Twilio format to AnyCable format." It casts the message sent back to the socket connection as one of the Twilio formats according the Type which I believe can be set by adding the metadata field BroadcastType to the broadcast from our ActionCable server.

Although I have not yet figured out how to include this metadata in my default Rails ActionCable broadcasts (tips appreciated!) I can see that by tweaking the Go code to assume the Type is a MediaEvent, I can coax AnyCable into passing back my audio data to the stream, completing the bi-directional stream. 👍👍

palkan commented 1 week ago

@junket Thanks for sharing your insights!

how to include this metadata in my default Rails ActionCable broadcasts (tips appreciated!)

Something like this should work:

AnyCable::Rails.with_broadcast_options(broadcast_type: "...") do
   code_that_performs_broadcasts
end

Or you can directly use AnyCable: AnyCable.broadcast(stream, data, {broadcast_type: "..."}).

junket commented 1 week ago

One thing I'm exploring (and would love to hear your take @palkan) is the ideal audio chunk size when streaming from ActionCable to AnyCable-Go.

My first successful stream used 20ms (160 byte) chunks of 8000Khz mu-law and came through cleanly in local development. However, on all subsequent tries, ActionCable seemed unable to send the chunks quickly enough, leading to unusable choppy audio. Is that an ActionCable performance bottleneck? Weird that it had been more performant. My code is a simple test chunking a small file like:

chunk_size = 160 # bytes per 20ms chunk
File.open(file_path, 'rb') do |file|
  while (chunk = file.read(chunk_size))
    ActionCable.server.broadcast(stream, chunk)
  end
end

If I up the chunk size to 200ms, ActionCable can keep and the audio is better. I assume there are trade-offs here but I am too much of a newb to really know what they are 😊

palkan commented 1 week ago

I think, the reason is in added network latency: 20ms of audio can be processed faster than the next 20ms arrive; when you send in a larger chunks, the network latency is about the same, but there is a buffer still full of audio from the previous chunk.

junket commented 1 week ago

Yep, this has to be it. I switched my broadcast URL from ngrok to localhost and the latency was virtually gone. I'll experiment with ideal chunk size. I assume there is an upper limit, but I don't know.