Open bordeuax opened 4 years ago
The PrometheusBasedResourceLimitChecks
class determines the number of currently connected devices by means of querying the Prometheus server. So, if the data in the Prometheus server is stale and indicates that the max number of connections is used up, then no additional devices will be able to connect. In order to reduce the lag until new connections are possible again, you could increase Prometheus server's frequency of scraping the adapters. However, it probably doesn't make much sense to scrape, say every 2 seconds.
In order to address the controlled rolling update scenario, we could probably improve the shut down process of protocol adapters:
WDYT? @kaniyan @bordeuax
@sophokles73 , this proposal regarding graceful shutdown
will add more predictable behavior. But i have one remark to the 1st point
1. reject any new connection attempts from devices
I think in this case we need to use health indicators like /liveness
and /readiness
. We need to keep /readiness
in KO status and /liveness
in OK status, the load-balancer (which have a health check point to /readiness
) will redirect the new traffic to the new pods.
If we will reject the connections at the level of pod this will not help, because LB will continue forward the traffic to this pod (at least the probability is very high that the new traffic will go to the old pod ) and we can have exactly the same behavior of rejected connections .
Environment
Actions
Real life behavior
Expected behavior
Note
Questions