In kubernetes - application states sync is delayed

OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.

https://orchardcore.net

BSD 3-Clause "New" or "Revised" License

7.41k stars 2.39k forks source link

In kubernetes - application states sync is delayed #14373

Open ns8482e opened 1 year ago

ns8482e commented 1 year ago

Describe the bug

Deploy OC in kubernetes environment like AKS, kubernetes in docker desktop or minikube.

To Reproduce

Install kubectl / kubernetes environment or use any clould env
Create folder C:\k8s\oc\1.7 and C:\k8s\redis
Download k8s-oc17.yml and save at c:\k8s
In terminal go to c:\k8s and run kubectl apply -f ./k8s-oc17.yml This will create one pod for redis and oc 1.7
Check kubectl get pods and validate redis and oc pod is running.
Create Default tenant
Enable "Redis", "Redis Lock", "Redis Bus", "Tenants", "Distributed Tenant Features"
Scale oc deployment up to 3 replicas using kubectl scale --replicas=3 deployment/oc-web
Check kubectl get pods to see 3 pods running.
Perform any action that needs tenant reload - like create new tenant and setup, enable/disable feature
Notice that the response shows incorrect state - like if feature is enabled - the button state still shows enable instead of disable. If new tenant setup then after submit it randomly shows setup screen again.

Expected behavior

Up to date state of the tenant should be reflected on post/redirect scenario without the request being served by same pod.

Screenshots

oc-kubernetes

@sebastienros @jtkech @MikeAlhayek

MikeAlhayek commented 1 year ago

@ns8482e as @jtkech mentioned in #11501 current state could be about 10 seconds behind. The post request of the current user could be handled by a different node that has not yet restarted.

An easy way to fix this is using the existing RedisBus to o dispatch and event to all the nodes so they can restart. Do you want to attempt implementing this in OC?

ns8482e commented 1 year ago

It's nice to have the state in real time when we use stateless deployment in k8s and scale up/down pods depending on load

jtkech commented 1 year ago

For info I was wrong, there is no pooling time, only an idle time and of 1 second.

Maybe worth to try it without idle time or a smaller one.

https://github.com/OrchardCMS/OrchardCore/blob/81ca23b1381a7385aadb29d23f2f24dacd4dc32e/src/OrchardCore/OrchardCore/Shell/Distributed/DistributedShellHostedService.cs#L26-L28

MikeAlhayek commented 1 year ago

@ns8482e I think it's a must note just nice. I think the IMessageBus is easy to use to fire an event before we restart the node. I don't know the best place to listen to this event, if you don't implement this, I may take a stab at it.

jtkech commented 1 year ago

Just some thoughts and info.

We also have a DistributedSignal (when Redis.Bus enabled) which is easier to use and that uses the message bus internally. But as I remember @sebastienros was not a fan of using a message bus which anyway adds a process that loops to listen events.

Maybe a solution around ShellDescriptor, I already thought about it but it is sensitive. Maybe a volatile document manager to relate the ShellDescriptor with a shared distributed Document Identifier.

Note: In a regular case we only reload a tenant on Features change.

I don't remember the details but we also have Tags that can be invalidated based on contexts, and we have a Features hash context, normally this becomes distributed with Redis. But not sure it is a good way.

So yes maybe worth to first try to reduce the idle time and maybe make it configurable.

ns8482e commented 1 year ago

@jtkech looks like redirect is served by second node before actual module disabling and released by hosted service

jtkech commented 1 year ago

Yes good point, but normally the new shell descriptor is already persisted, so maybe still worth to try to just decrease (not too much) the _minIdleTime in DistributedShellHostedService.

ns8482e commented 1 year ago

0.5 sec can't beat the redirect. Redirect happens before the change

sebastienros commented 1 year ago

Is there a way to tell k8s not to start a new instance since we are only restarting. Like make it assume the instance is actually healthy? Or "Alive".

jtkech commented 1 year ago

If a k8s node is configured to have n instances of a given image, k8s will take care to keep n running instances, here we need some affinity.

0.5 sec can't beat the redirect. Redirect happens before the change

I will think about it. Do you have any ingress proxy that could be configured for affinity?

The Microsoft Yarp.ReverseProxy that I'm using in #13633 has a docker image, as I remember it uses affinity by default. Hmm, I don't remember but maybe you can configure it to use affinity but only for some path(s) and/or maybe only for some path patterns.

ns8482e commented 1 year ago

@sebastienros yes k8s could handle using readiness. However it's not appropriate to not send request to the pod for all tenants where only one tenant state is not in sync.

The issue here is in feedback from redis. The second node is receiving redirect request before the reload id from redis.

This is happing because the redirect happens before the deferred task run that actually disables the feature.

@jtkech in k8s I am using nginx ingress class

jtkech commented 1 year ago

Just for info after seeing the recent meeting talking about this.

Here I want to point out that there are shared Documents managed by the DocumentManager using Redis that normally are not delayed, and there are tenant States.
Yes, when we update Admin Settings or enable/disable a feature, the tenant is released (not reloaded) and then will be rebuilt, and yes there is a delay before this tenant on another instance will be released,
But for Admin Settings, even if the tenant is not yet released, normally the Settings are in sync because they are managed by the DocumentManager which uses shared Document identifiers from Redis.

Note: We also use an in memory micro caching of 1s but which is configurable.
About changing features states, one solution would be to also manage the ShellDescriptor as a shared Document or a light Document only holding an Identifier, as we do for AutorouteEntries.

I thought about it but the ShellDescriptor was already managed differently by using a SerialNumber, so even if we already updated it, we kept the underlying pattern.

Yes the ShellDescriptor only use a local cache, and it is refreshed from the database only when the tenant is released, which is not the case for other Documents, maybe worth to do something.
As said with the Yarp proxy you can use affinity but only for some path patterns.

ns8482e commented 11 months ago

@jtkech After digging into DistributedShellHostedService it looks like following code is delaying reload as its waiting two minutes between each tenant load - what's the significance of waiting between reloading each tenant inside foreach loop?

https://github.com/OrchardCMS/OrchardCore/blob/fa2a9b241806e5c81571ff17fad5b3d3cef5ebb0/src/OrchardCore/OrchardCore/Shell/Distributed/DistributedShellHostedService.cs#L215-L218

jtkech commented 11 months ago

@ns8482e

Normally every 1s we lookup tenants (_minIdleTime) for keeping them in sync as needed, if it takes more than 2s (_maxBusyTime) (maybe a little short) it will wait again for 1s (_minIdleTime) before continuing the lookup. If it happens again the idle time is multiplied by 2 but limited to 1min (_maxRetryTime).

That said this host service only use the IDistributedCache of the Default tenant, so normally reloading another tenant should not impact the loop unless if reloading this tenant locally takes a long time.

But there may be an issue, let me know if you find something.