Hubs-Foundation / hubs-cloud

Resources for self hosted Hubs Cloud instances
Mozilla Public License 2.0
152 stars 96 forks source link

Enterprise multi-server setup: object in one app server's database is missing in the other returning 401 #89

Closed robinkwilson closed 3 years ago

robinkwilson commented 4 years ago

From a user via hubs@.

Environment

Issues: 401 errors returned for some objects. In one room, it will be spawned, in another, it will return 401. Likely because the room is handled by the app server with the object in db, then handled by the one without. We are consistently getting 401 errors when we deploy more than one instance. After further digging, we realize some of the objects we create on one app instance does not get replicated into the other instances’ PostgreSQL database. We have enabled sticky session on the load balancer and opened security group to all/all to all IP addresses inside the VPC, but that did not fix the issue.

Expected: All objects added to the cluster, should be found.

misslivirose commented 3 years ago

This is a significant issue for any significant deployment that scales or wants to have a redundancy plan in place for the deployment; essentially, there is a likelihood (we need to figure out root cause and how frequently this happens) that media will fail to load for some % of the users depending on the app server they are connected to.

gfodor commented 3 years ago

This may be the rate limiting being triggered but hard to say - the rate limiting is node local and I’m not sure offhand if the the media loading can be in the rate limited control flow - the proposed theory in the original submission is not right because there is a single shared database between all nodes.

On Thu, Nov 19, 2020 at 3:46 PM Liv notifications@github.com wrote:

This is a significant issue for any significant deployment that scales or wants to have a redundancy plan in place for the deployment; essentially, there is a likelihood (we need to figure out root cause and how frequently this happens) that media will fail to load for some % of the users depending on the app server they are connected to.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mozilla/hubs-cloud/issues/89#issuecomment-730707809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABVW5BDNC7VABGUR2VD4DLSQWU4ZANCNFSM4PF7TWPQ .

brianpeiris commented 3 years ago

This might be a duplicate of #116, though it's odd that the description mentions 401 errors, specifically. #116 presents with 500/504 errors.