bf2fc6cc711aee1a0c2a / kas-fleetshard

The kas-fleetshard-operator is responsible for provisioning and managing instances of kafka on a cluster. The kas-fleetshard-synchronizer synchronizes the state of a fleet shard with the kas-fleet-manager.
Apache License 2.0
7 stars 20 forks source link

rough draft of horizontal-ish scaling #908

Closed shawkins closed 3 months ago

shawkins commented 1 year ago

@k-wall @rareddy this is the rough draft of FSO handling the scaling aspects mentioned on the horizontal https://issues.redhat.com/browse/MGDSTRM-11100 and vertical https://issues.redhat.com/browse/MGDSTRM-9976 issues.

This assumes three resource scenarios:

It doesn't appear that having 0 replicas will be problem for OLM, however OLM does double check the hash it creates https://github.com/operator-framework/operator-lifecycle-manager/blob/dac8182eb62acc1cb489d17ccc34f243f43d4f94/pkg/controller/install/deployment.go#L285

If it doesn't match for the operations in https://github.com/operator-framework/operator-lifecycle-manager/blob/783bebf6d4811c0b36eabaa5e58a05a000a1dbfc/pkg/controller/operators/olm/operator.go then the csv won't transition properly through its phases correctly - logging errors or worse.

It does not seem like a good idea to go down the path of copying the hash functionality in Java, nor trying to time our updates to the deployment to match expectations with the OLM operator. So this is likely a dead end.

MikeEdgar commented 1 year ago

It does not seem like a good idea to go down the path of copying the hash functionality in Java, nor trying to time our updates to the deployment to match expectations with the OLM operator. So this is likely a dead end.

Patching the deployment(s) within the CSV might be viable. I don't think OLM will attempt to modify or restore the CSV back to the version unpacked from the bundle.

shawkins commented 1 year ago

Patching the deployment(s) within the CSV might be viable.

Thanks for the suggestion to move things up a level.

I don't think OLM will attempt to modify or restore the CSV back to the version unpacked from the bundle.

At least locally it does not seem to. This will put the CSV back into the installing phase temporarily - are you aware of any monitoring or other reason that would be an issue?

MikeEdgar commented 1 year ago

This will put the CSV back into the installing phase temporarily - are you aware of any monitoring or other reason that would be an issue?

It's possible that during the initial install the OCM status for the addon would take longer than otherwise to become Ready, but it won't impact readiness to provision a Kafka from KFM's perspective since it relies on the MKA's status for a Hybrid installation. We shouldn't have any monitoring of that at the service level.

shawkins commented 1 year ago

@k-wall @MikeEdgar @rareddy here's an update of the draft using the csv modification approach. I've also narrowed it to just the horizontal / replica manipulation. The resource discussion will continue on https://issues.redhat.com/browse/MGDSTRM-9976 and https://issues.redhat.com/browse/MGDSTRM-11132 but if this pr seems acceptable, we can reintroduce this approach as an alternative as well.

sonarcloud[bot] commented 1 year ago

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 4 Code Smells

8.5% 8.5% Coverage
0.0% 0.0% Duplication