bf2fc6cc711aee1a0c2a / kas-fleetshard

The kas-fleetshard-operator is responsible for provisioning and managing instances of kafka on a cluster. The kas-fleetshard-synchronizer synchronizes the state of a fleet shard with the kas-fleet-manager.
Apache License 2.0
7 stars 20 forks source link

MGDSTRM-10277 hardening the sync against unhealthy behavior #843

Closed shawkins closed 1 year ago

shawkins commented 1 year ago

This is purely trying to be more defensive to what was seen in MGDSTRM-10277. However there remains a completely unknown root cause - I didn't see anything exceptional called out. It's also not clear why only sync would be affected - we've moved up enough in the operator sdk such that it's using an informer for the primary controller resource (managedkafka) that should be exactly the same as what the sync is running.

shawkins commented 1 year ago

Also this is not fully defensive - presumably because of missing events / invalid cache state the control plane was not receiving status updates either.

If we end up seeing more frequent sync restarts due to these changes we'll have to find a root cause - and/or go back to using things like a timed restart job to ensure that the watches are fresh.

sonarcloud[bot] commented 1 year ago

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

72.0% 72.0% Coverage
0.0% 0.0% Duplication