ant-media / Ant-Media-Server

Ant Media Server is a live streaming engine software that provides adaptive, ultra low latency streaming by using WebRTC technology with ~0.5 seconds latency. Ant Media Server is auto-scalable and it can run on-premise or on-cloud.
https://antmedia.io
Other
4.29k stars 632 forks source link

Support to merge video as single track and send audio tracks separately in MCU #3715

Open mekya opened 3 years ago

mekya commented 3 years ago

Is your feature request related to a problem? Please describe. When MCU feature is used in conferencing, users can also hear their voice. Mixing audio for each user on the server side is not an efficient way. There should be some other efficient way to solve this problem.

Describe the solution you'd like Ant Media Server has started to support multi tracks in a single session. So the videos can be merged into a single stream and audio tracks can be sent to the viewer separately. Fortunately, user can just ask the user to not send his/her audio track.

kputyra commented 3 years ago

Whichever solution you choose, I would like to keep the audio MCU as an option. Our case:

  1. Some participants join a conference remotely via a website.
  2. Some participants are in a lecture hall with a ceiling microphone and a loudspeaker.
  3. Remote participants fetch individual audio streams that are merged in their browsers.
  4. The loudspeaker in the lecture hall is fed with a combined audio from all remote participants (hence, except the ceiling microphone).

We have already implemented 1-3 and for 4 we plan to use the audio MCU feature for remote streams and then ffmpeg to fetch the combined stream prepared by Ant Media. We have our own framework to manage meetings, which does not rely on Ant Media's conference rooms.

As you can seen, we plan to use the MCU feature for a single participant of a conference (the lecture hall). This looks like partial audio MCU:

kputyra commented 3 years ago

Another possible solution could be to send both the merged and client's stream, synchronized, so that the client can do the cancelling on its own side.

peterzanetti commented 3 years ago

Honestly this solution is not efficient for end users. MCU is ideal for audio because it sends 1 single reliable audio stream to the end user. Audio is far more important than video in any conference, as minor hiccups in video usually go unnoticed, whereas any minor hiccup in audio are disruptive to the conference. This is why we currently only run video through AMS, and we do audio through another MCU, because it is critically important that audio be delivered without issue. Trying to send multiple audio streams to a user is creating more overhead in the user's browser and increasingly the likelihood of drops and connection issues, not to mention bandwidth.

While "Mixing audio for each user on the server side is not an efficient way" may be true, it is the correct way to do it. You need to be able to deliver a single audio stream to the user without their own audio for the best possible USER experience. The server needs to do the work to make this experience possible.

mekya commented 3 years ago

Hi Guys Thank you for comments @peterzanetti @kputyra

As far as I understand, @peterzanetti you're on the side to let the server hardware powerful enough to mix the audio for each users separately?

peterzanetti commented 3 years ago

Yes. Obviously that means within reason. It would depend on what the actual server performance was like for a large conference with many users. But this is the method our current MCU audio vendor uses because it is in fact ideal.

mekya commented 3 years ago

OK. Thank you @peterzanetti for your feedback.