Open ns8482e opened 1 year ago
@ns8482e as @jtkech mentioned in #11501 current state could be about 10 seconds behind. The post request of the current user could be handled by a different node that has not yet restarted.
An easy way to fix this is using the existing RedisBus to o dispatch and event to all the nodes so they can restart. Do you want to attempt implementing this in OC?
It's nice to have the state in real time when we use stateless deployment in k8s and scale up/down pods depending on load
For info I was wrong, there is no pooling time, only an idle time and of 1 second.
Maybe worth to try it without idle time or a smaller one.
@ns8482e I think it's a must note just nice. I think the IMessageBus is easy to use to fire an event before we restart the node. I don't know the best place to listen to this event, if you don't implement this, I may take a stab at it.
Just some thoughts and info.
We also have a DistributedSignal
(when Redis.Bus enabled) which is easier to use and that uses the message bus internally. But as I remember @sebastienros was not a fan of using a message bus which anyway adds a process that loops to listen events.
Maybe a solution around ShellDescriptor
, I already thought about it but it is sensitive. Maybe a volatile document manager to relate the ShellDescriptor
with a shared distributed Document Identifier.
Note: In a regular case we only reload a tenant on Features change.
I don't remember the details but we also have Tags that can be invalidated based on contexts, and we have a Features hash context, normally this becomes distributed with Redis. But not sure it is a good way.
So yes maybe worth to first try to reduce the idle time and maybe make it configurable.
@jtkech looks like redirect is served by second node before actual module disabling and released by hosted service
Yes good point, but normally the new shell descriptor is already persisted, so maybe still worth to try to just decrease (not too much) the _minIdleTime
in DistributedShellHostedService
.
0.5 sec can't beat the redirect. Redirect happens before the change
Is there a way to tell k8s not to start a new instance since we are only restarting. Like make it assume the instance is actually healthy? Or "Alive".
If a k8s node is configured to have n instances of a given image, k8s will take care to keep n running instances, here we need some affinity.
0.5 sec can't beat the redirect. Redirect happens before the change
I will think about it. Do you have any ingress proxy that could be configured for affinity?
The Microsoft Yarp.ReverseProxy that I'm using in #13633 has a docker image, as I remember it uses affinity by default. Hmm, I don't remember but maybe you can configure it to use affinity but only for some path(s) and/or maybe only for some path patterns.
@sebastienros yes k8s could handle using readiness. However it's not appropriate to not send request to the pod for all tenants where only one tenant state is not in sync.
The issue here is in feedback from redis. The second node is receiving redirect request before the reload id from redis.
This is happing because the redirect happens before the deferred task run that actually disables the feature.
@jtkech in k8s I am using nginx ingress class
Just for info after seeing the recent meeting talking about this.
Here I want to point out that there are shared Documents
managed by the DocumentManager
using Redis that normally are not delayed, and there are tenant States
.
Yes, when we update Admin Settings or enable/disable a feature, the tenant is released (not reloaded) and then will be rebuilt, and yes there is a delay before this tenant on another instance will be released,
But for Admin Settings, even if the tenant is not yet released, normally the Settings are in sync because they are managed by the DocumentManager
which uses shared Document
identifiers from Redis.
Note: We also use an in memory micro caching of 1s but which is configurable.
About changing features states, one solution would be to also manage the ShellDescriptor
as a shared Document
or a light Document
only holding an Identifier, as we do for AutorouteEntries
.
I thought about it but the ShellDescriptor
was already managed differently by using a SerialNumber
, so even if we already updated it, we kept the underlying pattern.
Yes the ShellDescriptor
only use a local cache, and it is refreshed from the database only when the tenant is released, which is not the case for other Documents
, maybe worth to do something.
As said with the Yarp proxy you can use affinity but only for some path patterns.
@jtkech After digging into DistributedShellHostedService
it looks like following code is delaying reload as its waiting two minutes between each tenant load - what's the significance of waiting between reloading each tenant inside foreach loop?
@ns8482e
Normally every 1s we lookup tenants (_minIdleTime
) for keeping them in sync as needed, if it takes more than 2s (_maxBusyTime
) (maybe a little short) it will wait again for 1s (_minIdleTime
) before continuing the lookup. If it happens again the idle time is multiplied by 2 but limited to 1min (_maxRetryTime
).
That said this host service only use the IDistributedCache
of the Default tenant, so normally reloading another tenant should not impact the loop unless if reloading this tenant locally takes a long time.
But there may be an issue, let me know if you find something.
Describe the bug
Deploy OC in kubernetes environment like AKS, kubernetes in docker desktop or minikube.
To Reproduce
C:\k8s\oc\1.7
andC:\k8s\redis
k8s-oc17.yml
and save at c:\k8sc:\k8s
and runkubectl apply -f ./k8s-oc17.yml
This will create one pod for redis and oc 1.7kubectl get pods
and validate redis and oc pod is running.kubectl scale --replicas=3 deployment/oc-web
kubectl get pods
to see 3 pods running.Expected behavior
Up to date state of the tenant should be reflected on post/redirect scenario without the request being served by same pod.
Screenshots
@sebastienros @jtkech @MikeAlhayek