bigbluebutton / bbb-webrtc-sfu

Control server for WebRTC SFU
GNU Lesser General Public License v3.0
51 stars 46 forks source link

ROUND_ROBIN balancing strategy: make it production ready #65

Closed jibon57 closed 2 years ago

jibon57 commented 4 years ago

Hello,

I was trying to add 2 external kurento servers like this way:

- ip: External_IP_1
  url: ws://External_IP_1:8888/kurento
  # mediaType: (main|audio|content)
  ipClassMappings:
    local:
    private:
    public:
  options:
    failAfter: 5
    request_timeout: 30000
    response_timeout: 30000
- ip: External_IP_2
  url: ws://External_IP_2:8888/kurento
  # mediaType: (main|audio|content)
  ipClassMappings:
    local:
    private:
    public:
  options:
    failAfter: 5
    request_timeout: 30000
    response_timeout: 30000

And balancing-strategy: ROUND_ROBIN But I saw bbb-webrtc-sfu was selecting first server only. It wasn't select second one. Am I following correctly? Please help to find in where I'm doing mistake.

Log:

2020-09-28T09:45:47.359Z - info: [mcs-freeswitch-esl-wrapper] Connected to FreeSWITCH ESL
2020-09-28T09:45:47.369Z - info: [mcs-balancer] Available hosts => [{"url":"ws://External_IP_1:8888/kurento","ip":"External_IP_1","mediaType":"all"}]
2020-09-28T09:45:47.371Z - info: [mcs-balancer] Available hosts => [{"url":"ws://External_IP_1:8888/kurento","ip":"External_IP_1","mediaType":"all"},{"url":"ws://External_IP_2:8888/kurento","ip":"External_IP_2","mediaType":"all"}]

Thanks

jibon57 commented 4 years ago

I think I've found the reason: https://github.com/bigbluebutton/bbb-webrtc-sfu/blob/6b26493b0c17bca2b667779b8498abeb66310b3c/lib/mcs-core/lib/media/balancer.js#L181 May be it's because it's wait until it cross video-transposing-ceiling & audio-transposing-ceiling value.

prlanzarin commented 4 years ago

May be it's because it's wait until it cross video-transposing-ceiling & audio-transposing-ceiling value.

That's correct. It does a round-robin based on the configured thresholds. As it is now, it puts every media (inbound or outbound) into the round robin scheme following the configured *-ceilings. If there's a host mismatch between medias (eg subscriber media X is on KMS 1, publisher media Y is on KMS 2), a RTP bridge is done between KMS 1 and 2 to connect Y to X.

I don't recommend using ROUND_ROBIN in production yet, though. It's in an experimental state and has problems with unintended transcoding due to how medias are transposed between Kurento instances. And the transposing is currently on plain RTP, so medias are not DTLS-SRTP encrypted.

You're better off with MEDIA_TYPE for the time being. I plan on reviewing the ROUND_ROBIN strategy soon. I'll probably make it simpler: a true, naive inbound media round robin. That will make it usable and easier to maintain.

jibon57 commented 4 years ago

Thanks for your response @prlanzarin . Can I use multiple KMS servers as MEDIA_TYPE? For example: Server 1 & Server 2 for webcams only:

- ip: External_IP_1
  url: ws://External_IP_1:8888/kurento
  mediaType: main
  ipClassMappings:
    local:
    private:
    public:
  options:
    failAfter: 5
    request_timeout: 30000
    response_timeout: 30000
- ip: External_IP_2
  url: ws://External_IP_2:8888/kurento
  mediaType: main
  ipClassMappings:
    local:
    private:
    public:
  options:
    failAfter: 5
    request_timeout: 30000
    response_timeout: 30000
- ip: External_IP_3
  url: ws://External_IP_3:8888/kurento
  mediaType: audio
  ipClassMappings:
    local:
    private:
    public:
  options:
    failAfter: 5
    request_timeout: 30000
    response_timeout: 30000
- ip: External_IP_3
  url: ws://External_IP_3:8888/kurento
  mediaType: content
  ipClassMappings:
    local:
    private:
    public:
  options:
    failAfter: 5
    request_timeout: 30000
    response_timeout: 30000
kurentoStartupRetries: 10
balancing-strategy: MEDIA_TYPE
video-transposing-ceiling: 1
audio-transposing-ceiling: 1

Is that correct way to setup?

prlanzarin commented 4 years ago

@jibon57 That's the correct way, but I also don't recommend doing that. What MEDIA_TYPE with multiple KMSs per media type does is use the aforementioned ROUND_ROBIN algorithm under the hood (which has the same problems I mentioned before).

I'd recommend sticking to a single instance per media type right now. As I mentioned, I'll probably revisit the ROUND_ROBIN algorithm soon and then you'll be able to use it in production if you wish. I'll keep this issue open until it ships.

jibon57 commented 4 years ago

Thanks @prlanzarin for response. Yes, you're right single instance per media type giving much better result then ROUND_ROBIN. I'll keep using it until you work on ROUND_ROBIN. Thanks for the suggestion :)

jibon57 commented 2 years ago

@prlanzarin I don't think this will require anymore. You did excellent job to integrate mediasoup. I saw a huge performance improvement using mediasoup server.