Make RTC engine modular

Currently, to add new functionality to the RTC engine (like stream recording), it's necessary to modify the engine itself. This is problematic due to the following reasons:

it's hard to add custom/proprietary functionality to the engine
adding new functionalities in parallel results in many conflicts
the size of the RTC engine grows quickly

An idea of how to solve this is to reduce the RTC engine responsibility to route the traffic between externally delivered endpoints. Such endpoints could handle various protocols and serve different purposes. Apart from the WebRTC endpoint, we could have

Broadcasting endpoint (HLS/DASH/RTMP)
Recording endpoint (mp4/WebM)
An endpoint reading a recorded file and sending it to the RTC engine
SIP endpoint
RTP dump endpoint
...

The RTC Engine could be only aware of the routing stuff - peers, tracks (their metadata, possibly also codecs), distribution, etc.

Implementation-wise, RTC endpoints could be implementations of an RTC.Endpoint behaviour. Each RTC endpoint could provide a piece of pipeline to be linked into the engine. Possibly it could be a Membrane.Bin exposing compatible pads and handling messages defined in an API. For example, to make the WebRTC.Endpoint work with the RTC engine, we would possibly have to wrap it into the RTC.Endpoint implementation - that way the WebRTC.Endpoint wouldn't rely on the RTC Engine nor the other way around. Then, when adding/accepting a new peer, the user could choose which RTC.Endpoint implementation to use for that peer.

Since the recording endpoint, for example, can be spawned per user or per entire room, possibly both the RTC endpoint implementation and the user choosing it should be able to decide which streams the endpoint instance should receive (or where its streams should be sent).

To get there from the current state, we should remove all the WebRTC-specific stuff from the RTC engine and move it either to the WebRTC.Endpoint or the new RTC.Engine implementation for the WebRTC.Endpoint. We should remember that the WebRTC.Endpoint should remain usable without the RTC.Engine.

Let me know what do you think ;)

I like the idea of being able to plug in different endpoints.

One idea we had was to identify clear callbacks, and have one process for the Membrane.RTC.Engine rather than having to start a separate process to keep track of application state.

defmodule MyApp.Room do
  use Membrane.RTC.Engine

  @impl Membrane.RTC.Engine
  def initialize(state) do
    # ....
    {:ok, %{}} ## saves into app-specific state
  end

  @spec new_peer(map(), map()) :: 
    :accept
      | {:accept, node()}
      | {:accept, endpoint_module :: module()}
      | {:accept, endpoint_module :: module(), node()}
  def new_peer(peer_data, app_state) do
    {:accept}
  end

  @spec handle_track_added(map(), list(peer()) map()) :: {:link, list(peer()), map()) | {:ignore, map())
  def handle_track_added(new_track, other_peers, app_state) do
    if some_logic? do
     peer_subset = Enum.filter(other_peers, &select_some_peers/2)
      {:link, peer_subset, app_state}
    else
      {:ignore, app_state}
    end
  end

  @impl Membrane.RTC.Engine
  def handle_peer_ready(new_peer, other_peer_tracks, app_state) do
    track_subset = Enum.filter(other_peer_tracks, &select_some_tracks/2)
    {:link, track_subset, app_state}
  end
end

So the places were the engine calls send/2 to the callback process could all be replaced by callback functions. Maybe there would be places where the callbacks would be able to choose between different types of endpoints.

The types of things we know we do:

Custom logic on engine start:
- extension configuration
- per-codec elements
- registration of the process (:global, :syn, postgres, etc)
A timer where we can decide to stop the engine.
- Theoretically we could stop the room when the last peer leaves, but if callback functions are used, it might be nice to have handle_info/2 call through to something, kind of like how Membrane delegates to handle_other/3.
A new peer joins
- decide whether to accept/reject the peer.
- decide what node to start the peer on.
A new track is added for an existing peer.
A track is removed from an existing peer.
The signaling channel process goes down
The peer ICE connection does away
- Update registries
Engine termination

Whether or not different types of Endpoint bins exist, I think that soon we'll want to be able to:

Send a message from outside the engine to a specific endpoint.
- Enable/Disable track filters
Link/unlink specific elements or bins in the main pipeline.
- Assuming there are 30 participants, but we only want to show 10 videos at a time, we may want to keep all audio tracks links to all other endpoints, but link only 10 outbound video tracks to each participant. In this case, the application may want to change the video tracks, either depending on VAD or a person selecting something in the UI.

One idea we had was to identify clear callbacks, and have one process for the Membrane.RTC.Engine rather than having to start a separate process to keep track of application state.

Our approach was basically to do that in the next API layer, which would handle application state management, but also some common things that conferencing apps have, like user roles (admin/registered/guest), join queues, hand raising etc. That one could be callback-based, so more like a framework. But underneath we planned to keep an engine with a simple request-response API (more like a library), that can be developed separately. It's definitely possible that we're overcomplicating things (again :P), though I think it would be great to keep user state management and non-media features somehow separate. However, we can switch to callbacks now anyway and recreate the intermediary API when things grow too much.

What was already suggested and we can do right away, is wrapping sends into functions, so they didn't have to be done manually. That would make it possible to easily change them into GenServer.calls to make things more synchronous.

That said, if I understand correctly, even though this is somehow related to modularization, it seems to me like a separate issue - not sure how it would help parallelising work on the RTC engine itself. I think we should choose what to focus on ;)

fishjam-dev / membrane_rtc_engine

Make RTC engine modular #33