fishjam-dev / membrane_rtc_engine

Customizable Real-time Communication Engine/SFU library focused on WebRTC.
Apache License 2.0
141 stars 13 forks source link

Make RTC engine modular #33

Closed mat-hek closed 2 years ago

mat-hek commented 3 years ago

Currently, to add new functionality to the RTC engine (like stream recording), it's necessary to modify the engine itself. This is problematic due to the following reasons:

An idea of how to solve this is to reduce the RTC engine responsibility to route the traffic between externally delivered endpoints. Such endpoints could handle various protocols and serve different purposes. Apart from the WebRTC endpoint, we could have

The RTC Engine could be only aware of the routing stuff - peers, tracks (their metadata, possibly also codecs), distribution, etc.

Implementation-wise, RTC endpoints could be implementations of an RTC.Endpoint behaviour. Each RTC endpoint could provide a piece of pipeline to be linked into the engine. Possibly it could be a Membrane.Bin exposing compatible pads and handling messages defined in an API. For example, to make the WebRTC.Endpoint work with the RTC engine, we would possibly have to wrap it into the RTC.Endpoint implementation - that way the WebRTC.Endpoint wouldn't rely on the RTC Engine nor the other way around. Then, when adding/accepting a new peer, the user could choose which RTC.Endpoint implementation to use for that peer.

Since the recording endpoint, for example, can be spawned per user or per entire room, possibly both the RTC endpoint implementation and the user choosing it should be able to decide which streams the endpoint instance should receive (or where its streams should be sent).

To get there from the current state, we should remove all the WebRTC-specific stuff from the RTC engine and move it either to the WebRTC.Endpoint or the new RTC.Engine implementation for the WebRTC.Endpoint. We should remember that the WebRTC.Endpoint should remain usable without the RTC.Engine.

Let me know what do you think ;)

sax commented 3 years ago

I like the idea of being able to plug in different endpoints.

One idea we had was to identify clear callbacks, and have one process for the Membrane.RTC.Engine rather than having to start a separate process to keep track of application state.

defmodule MyApp.Room do
  use Membrane.RTC.Engine

  @impl Membrane.RTC.Engine
  def initialize(state) do
    # ....
    {:ok, %{}} ## saves into app-specific state
  end

  @spec new_peer(map(), map()) :: 
    :accept
      | {:accept, node()}
      | {:accept, endpoint_module :: module()}
      | {:accept, endpoint_module :: module(), node()}
  def new_peer(peer_data, app_state) do
    {:accept}
  end

  @spec handle_track_added(map(), list(peer()) map()) :: {:link, list(peer()), map()) | {:ignore, map())
  def handle_track_added(new_track, other_peers, app_state) do
    if some_logic? do
     peer_subset = Enum.filter(other_peers, &select_some_peers/2)
      {:link, peer_subset, app_state}
    else
      {:ignore, app_state}
    end
  end

  @impl Membrane.RTC.Engine
  def handle_peer_ready(new_peer, other_peer_tracks, app_state) do
    track_subset = Enum.filter(other_peer_tracks, &select_some_tracks/2)
    {:link, track_subset, app_state}
  end
end

So the places were the engine calls send/2 to the callback process could all be replaced by callback functions. Maybe there would be places where the callbacks would be able to choose between different types of endpoints.

sax commented 3 years ago

The types of things we know we do:

Whether or not different types of Endpoint bins exist, I think that soon we'll want to be able to:

mat-hek commented 3 years ago

One idea we had was to identify clear callbacks, and have one process for the Membrane.RTC.Engine rather than having to start a separate process to keep track of application state.

Our approach was basically to do that in the next API layer, which would handle application state management, but also some common things that conferencing apps have, like user roles (admin/registered/guest), join queues, hand raising etc. That one could be callback-based, so more like a framework. But underneath we planned to keep an engine with a simple request-response API (more like a library), that can be developed separately. It's definitely possible that we're overcomplicating things (again :P), though I think it would be great to keep user state management and non-media features somehow separate. However, we can switch to callbacks now anyway and recreate the intermediary API when things grow too much.

What was already suggested and we can do right away, is wrapping sends into functions, so they didn't have to be done manually. That would make it possible to easily change them into GenServer.calls to make things more synchronous.

That said, if I understand correctly, even though this is somehow related to modularization, it seems to me like a separate issue - not sure how it would help parallelising work on the RTC engine itself. I think we should choose what to focus on ;)