blindsidenetworks / scalelite

Scalable load balancer for BigBlueButton.
GNU Affero General Public License v3.0
472 stars 248 forks source link

redis database gets inconsitent after a while #542

Closed christmart closed 3 years ago

christmart commented 3 years ago

If you run scalelite for a while and servers get added and removed and maybe servers get powered down without removing them from scalelite and maybe a meeting is still running on the server and scalelite has still registered it, the redis database gets inconsitent. This leads sometimes to the "endpoint and security" error while trying to start a meeting or even the try to access a server which is not powered on anymore.

We have seen inconsistencies in the following keys:

scalelite:server: <-> scalelite: scalelite:server: <-> scalelite:servers scalelite:meeting: <-> scalelite:meetings scalelite:server_enabled <-> scalelite:server_load scalelite:server missing for meeting

I build a (checkmk) test script to find inconsitencies :

https://gitlab.rlp.net/zdvsysunix-public/bbb/bbb-helper/-/blob/main/monitoring/redis-scalelite

We sometines have to delete the conflicting keys every day.

git-lama commented 3 years ago

@christmart before removing a server from SL if it has active meetings, the server:panic must be run to remove any existing meetings on that server, i believe that's whats causing the inconsistency for you. There are few new changes coming up in SL, so that the remove:Server would handle this case.

christmart commented 3 years ago

I also get inconsitencies in redis if I remove the server and there are no meetings left on the server. It does not lead to an error in scalelite but is still inconsistent in redis. The other case is if a BBB server crashes while there are still meetings on this server. Then it is not possible to restart the meeting on another server because scalelite does not do a cleanup in redis and still tries to connect to this server. You get an error like "no server for meeting" or similar in the log.

git-lama commented 3 years ago

@christmart thanks for pointing this out, we have been working on the fixes for redis inconsistencies issues and it will be released soon.