Open JesusRo opened 3 years ago
@mkilchhofer , maybe you have insights on what we need to make multiple replicas behave better. It seems it's trying to insert the group multiple times.
I think the software developers need to make sure everything from the software-side supports multiple replicas (leader election, app clustering, etc.).
Nebraska is using an OR mapper, right? Maybe this component keeps stuff in memory and already returns 200 OK
to the client?
The only thing we can do IMHO on the Kubernetes side to mitigate the most critical pain points is to use sticky sessions on the Ingress, but config depends on what Ingress controller the endusers are using. For ingress-nginx:
ingress:
annotations:
nginx.ingress.kubernetes.io/affinity: "cookie"
nginx.ingress.kubernetes.io/session-cookie-name: "INGRESSCOOKIE"
nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
Edit: Oh the main problem here seems to be the syncer thing. I think we need app clustering here or something like only one instance syncs stuff from the upstream repo.
Hi @mkilchhofer, I was testing using sticky sessions (nginx ingress controller) using 3 replicas of nebraska and it was fine for a while but after more intense usage (creating groups, adding vms, etc) the db blew up. Started to see locked queries/inserts/updates, waring/errors about foreign keys, and similar messages on Nebraska about "Duplicated keys"
It might be happened that all was fine until syncer kicked in? I will try to bring up new environment without it and retest. I
'm curious too, if I could set any kind of active/standby stuff on the ingress controller, so when not having load balanced, it will increase availability in case of troubles on the node
Anyhow, if the app is not actually ready for HA, I would consider it more an enhancement than an issue per se
thanks!
Description
Errors running randomly when deployment has more than 1 replica
Impact
Operations are not performed (example: create group)
Environment and steps to reproduce
Expected behavior
Operation are correctly applied
Additional information