bigbluebutton / docker

Docker files for BigBlueButton
GNU Lesser General Public License v3.0
369 stars 248 forks source link

NGINX: Expects 8 html5-frontend instances which leads to long startup/connection time when starting a meeting #113

Open maxee opened 3 years ago

maxee commented 3 years ago

Problem

The service nginx expects 8 instances of html5-frontend for load-balancing in /etc/nginx/conf.d/default.conf:

upstream poolhtml5servers {
  zone poolhtml5servers 32k;
  least_conn;
  server 10.7.7.200:4100 fail_timeout=10s max_fails=4 backup;
  server 10.7.7.201:4101 fail_timeout=120s max_fails=1;
  server 10.7.7.202:4102 fail_timeout=120s max_fails=1;
  server 10.7.7.203:4103 fail_timeout=120s max_fails=1;
  server 10.7.7.204:4104 fail_timeout=120s max_fails=1;
  server 10.7.7.205:4105 fail_timeout=120s max_fails=1;
  server 10.7.7.206:4106 fail_timeout=120s max_fails=1;
  server 10.7.7.207:4107 fail_timeout=120s max_fails=1;
}

However, the default env-file .env.sample enables only one instance of html-frontend (the one being declared as backup in the nginx config).

As all other seven instances are not started and therefor not available, nginx cycles through all of them. Once connecting to all upstream instances failed, it considers the (only working) backup instance.

This behavior leads to at least 10 seconds of additional loading/connection time.

nginx_1                | 2021/05/15 10:39:48 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.203:4103/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1                | 2021/05/15 10:39:48 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.203:4103/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1                | 2021/05/15 10:39:52 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.207:4107/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1                | 2021/05/15 10:39:52 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.207:4107/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1                | 2021/05/15 10:39:55 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.204:4104/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1                | 2021/05/15 10:39:55 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.204:4104/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
bbb-web_1              | 2021-05-15T10:39:55.698Z DEBUG o.b.web.controllers.ApiController - ApiController#index
kurento_1              | 14:50:00.247595291     1 0x7f0250001400 INFO    KurentoWebSocketTransport WebSocketTransport.cpp:346:keepAliveSessions: Keep-Alive for session 'b3fc38ca-2843-4bff-a3aa-789686ec996c'
nginx_1                | 2021/05/15 10:39:58 [error] 34#34: *2056589 connect() failed (113: Host is unreachable) while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.201:4101/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1                | 2021/05/15 10:39:58 [warn] 34#34: *2056589 upstream server temporarily disabled while connecting to upstream, client: 127.0.0.1, server: _, request: "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1", upstream: "http://10.7.7.201:4101/html5client/join?sessionToken=6xqazym6d2ogsrs0", host: "xxx", referrer: "https://xxx/b/cer-ihk-odp-sql"
nginx_1                | 127.0.0.1 - - [15/May/2021:10:39:58 +0000] "GET /html5client/join?sessionToken=6xqazym6d2ogsrs0 HTTP/1.1" 200 5096 "https://xxx/b/cer-ihk-odp-sql" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"

Workaround

Method 1

Set NUMBER_OF_FRONTEND_NODEJS_PROCESSES=1 in .env to 8, rebuild and restart the whole stuff.

Method 2

Remove the seven frontend instances and keep only the current backup instance:

  1. docker-compose exec nginx /bin/ash
  2. vi /etc/nginx/conf.d/default.conf
  3. Edit the file as follows:
map $remote_addr $freeswitch_addr {
    "~:"    [::1];
    default    10.7.7.1;
}

upstream poolhtml5servers {
  zone poolhtml5servers 32k;
  least_conn;
  server 10.7.7.200:4100 fail_timeout=10s; # max_fails=4 backup;
#  server 10.7.7.201:4101 fail_timeout=120s max_fails=1;
#  server 10.7.7.202:4102 fail_timeout=120s max_fails=1;
#  server 10.7.7.203:4103 fail_timeout=120s max_fails=1;
#  server 10.7.7.204:4104 fail_timeout=120s max_fails=1;
#  server 10.7.7.205:4105 fail_timeout=120s max_fails=1;
#  server 10.7.7.206:4106 fail_timeout=120s max_fails=1;
#  server 10.7.7.207:4107 fail_timeout=120s max_fails=1;
}

server {
  listen 8080 default_server;
  listen [::]:8080 default_server;
  server_name _;
  access_log /dev/stdout;
  absolute_redirect off;
  root /www/;

  # opt-out of google's floc tracking
  # https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea
  add_header Permissions-Policy "interest-cohort=()";

  # redirect to greenlight
  location = / {
      return 302 /b;
  }

  # Include specific rules for record and playback
  include /etc/nginx/bbb/*.nginx;

}
  1. exit
  2. docker-compose restart nginx

Solution

The nginx config should only expect as many instances of html-frontend as specified in .env.

antobinary commented 3 years ago

Linking to the bigbluebutton/bigbluebutton issue https://github.com/bigbluebutton/bigbluebutton/issues/12291

ch9hn commented 3 years ago

We just templated that bug out with Ansible:

upstream poolhtml5servers {
  zone poolhtml5servers 32k;
  least_conn;
  server 10.7.7.200:4100 fail_timeout=10s max_fails=4 backup;
  {% for n in range(vars.meteor_backend_processes + vars.meteor_frontend_processes|int)%}
    server {{ '10.7.7.201' | ipmath(n) }}:{{4101 + n}} fail_timeout=120s max_fails=1;
  {% endfor %}
}
crosscodr commented 3 years ago

Hi @chfxr, you only need the number of meteor_frontend_processes here, don't you? At least, as I understood it after reading the bigbluebutton issue. I came up with the following template:

upstream poolhtml5servers {
  zone poolhtml5servers 32k;
  least_conn;
{% for i in range(bbb_html5_frontend_processes | default(2) | int(2)) %}
  server 127.0.0.1:410{{ i }} fail_timeout=5s max_fails=3;
{% endfor %}
}