blindsidenetworks / scalelite

Scalable load balancer for BigBlueButton.
GNU Affero General Public License v3.0
465 stars 249 forks source link

Support for tagged servers (ready for review) #1049

Closed Ithanil closed 2 months ago

Ithanil commented 4 months ago

Description

This PR implements a new feature called "Server Tags" or "Tagged Servers".

It is supposed to work as follows:

  1. When adding a server in Scalelite, you can optionally specify a non-empty string as "tag" for the server. Per default, it is nil.
  2. When making a "create" API call towards Scalelite, you can optionally pass a meta_server-tag string as parameter. If passed, it will be handled as follows:
    • If the last character of meta_server-tag is not a '!', the tag will will be intepreted as optional. The meeting will be created on the lowest load server with the corresponding tag, if any is available, or on the lowest load untagged (i.e. tag == nil) server otherwise.
    • If the last character of meta_server-tag is a '!', this character will be stripped and the remaining tag will be interpreted as required. The meeting will be created on the lowest load server with the corresponding tag or fail to be created (with specific error message), if no matching server is available.
  3. Create calls without or with ''/'!' as meta_server-tag will only match untagged servers. So, for a frontend unaware of the feature, SL will behave as previously if a pool of untagged ("default") servers is maintained. It is recommended to always add your default servers as untagged servers.

This feature allows to provide users the choice of specially configured servers (e.g. optimized for very large conferences) or newer, potentially less stable, BBB versions, without the need of using dedicated LB + Frontend infrastructures. It might be very useful for the transition to BBB 3.0, of which I know many admins are afraid due to the amount of changes in the BBB backend.

Note: Because in the current handling of the create call, Server.find_available will always be executed before checking for existing meetings, any error raised in that method will lead to the create call failing (even if the meeting does already exist). This could happen before, if no more valid server are in server_load. Now it could also happen if no servers with a required tag are present, or no untagged servers when falling back from an optional tag. Merging https://github.com/blindsidenetworks/scalelite/pull/1050 would change that such that an existing meeting will be selected even if find_available would fail (for any reason).

TODO:

Testing Steps

For the already added code I have added corresponding automated tests. The code is also already used in actual production deployment, but in combination with https://github.com/blindsidenetworks/scalelite/pull/1050 and https://github.com/blindsidenetworks/scalelite/pull/1052.

Ideas for Greenlight

I imagine to have a per-role configuration of allowed tags, analogue to the allowed recording visibilities configuration. The difference would be however, that here we need a configurable list of possible tags, unlike the fixed/hardcoded list of possible recording visibilities.

Other ideas

It turned out that one could also "abuse" this feature to achieve per-tenant server pools by enforcing a certain meta_server-tag via OVERRIDE_CREATE_PARAMS for each (or some) tenants. But I think it would be better to have this as an explicit feature and it could be implemented in a very similar way to tags, by allowing to set a list of tenant IDs for a server to restrict the usage to them. But I think I'll wait for a review on the current PRs before working on this.

Merge with PR 1052

After merging https://github.com/blindsidenetworks/scalelite/pull/1052, find_available in app/models/server.rb will look like this:

def self.find_available(tag_arg = nil)
    # Check if tag is required
    tag = tag_arg.presence
    tag_required = false
    if !tag.nil? && tag[-1] == '!'
        tag = tag[0..-2].presence # always returns String, if tag is String
        tag_required = true
    end

    # Find available&matching server with the lowest load
    with_connection do |redis|
      ids_loads = redis.zrange('server_load', 0, -1, with_scores: true)
      raise RecordNotFound.new("Could not find any available servers.", name, nil) if ids_loads.blank?
      if !tag.nil? && ids_loads.none? { |myid, _| redis.hget(key(myid), 'tag') == tag }
        raise RecordNotFound.new("Could not find any available servers with tag=#{tag}.", name, nil) if tag_required
        tag = nil # fall back to servers without tag
      end
      ids_loads = ids_loads.select { |myid, _| redis.hget(key(myid), 'tag') == tag }
      id, load, hash = ids_loads.each do |id, load|
        hash = redis.hgetall(key(id))
        break id, load, hash if hash.present?
      end
      raise RecordNotFound.new("Could not find any available servers.", name, id) if hash.blank?

      hash['id'] = id
      if hash['state'].present?
        hash['state'] = 'enabled' # all servers in server_load set are enabled
      else
        hash['enabled'] = true
      end
      hash['load'] = load
      hash['online'] = (hash['online'] == 'true')
      new.init_with_attributes(hash)
    end
  end
ffdixon commented 3 months ago

Thanks -- will review.

farhatahmad commented 2 months ago

@Ithanil Can this be rebased please?

Ithanil commented 2 months ago

@farhatahmad I opted for a merge, because rebase is a bit convoluted in this case. FYI: We use the present code + https://github.com/blindsidenetworks/scalelite/pull/1050 in production since Monday, no indication of any problems so far.