GoogleCloudPlatform / gke-autoneg-controller

This GKE controller provides simple custom integration between GKE and GCLB.
Apache License 2.0
161 stars 51 forks source link

#109 overrides external managed capacity scaling #113

Closed xinau closed 6 months ago

xinau commented 10 months ago

Hey :wave:, the change introduced in #109 results in the capacity scaler setting being overridden when being set by an external tool. This is a breakage to the previous behavior and can be potentially be dangerous for production environments.

Before, when a service had been created with an initial_capacity of 50 the NEGs would have been added with a capacity of 50%. When the capacity got changed on the NEGs via an external tool, the controller during reconciliation would leave it as it is.

With #109, when the capacity is changed and the controller reconciles, the capacity is set back to the initial_capacity.

This can be easily compared with the latest version and v1.0.0.

  1. create a service with a neg config for "initial_capacity": 50
  2. wait for the controller to add the NEGs with 50% to the backend
  3. update the capacity in the console to 75%
  4. wait for the console to update the capacity to 75%
  5. wait for or force a reconciliation by the controller
  6. check the capacity of the NEGs

I do see merit in having an "overriding" annotation as well as externally manageable way. This can probably be best achived by

Thanks for creating this tool.

cc @rosmo and @darkstarmv

darkstarmv commented 10 months ago

I can add a flag with default value of false to prevent sync, so original behavior is not changed

xinau commented 10 months ago

My 5 cent. I think restoring the original behavior would be the sane thing to do regardless, or properly announcing the breakage in the next release.

I think it's important to name either the flag properly or use another field as initial_capacity clearly indicates that this is the initial value and thus it can be assumed that it won't be causing any effect afterwards.

rosmo commented 9 months ago

@xinau Yes, any changes will be announced properly. I wouldn't recommend using master branch built versions in production.

@darkstarmv I'd be fine with both implementation (initial_capacity or flag update_capacity (or similar)).

xinau commented 9 months ago

@rosmo, Thanks for taking care of this.

rosmo commented 8 months ago

@darkstarmv Do you have some time to add flag or should I look into it?

darkstarmv commented 8 months ago

@rosmo I can work on it this week. My plan is:

  1. add controller.autoneg.dev/sync annotation, so users can set synchronization of K8s settings to Google per K8s service.
  2. If controller.autoneg.dev/sync: true - update capacityScaler with K8s service value
  3. If controller.autoneg.dev/sync: false or not present - do not update capacityScaler from k8s service value
rosmo commented 7 months ago

@darkstarmv Thanks, let me know if you need help.

rosmo commented 6 months ago

I went ahead and implemented the above, but with a slight twist, making the sync annotation more flexible. Please see #125.

The annotation would be controller.autoneg.dev/sync: '{"capacity_scaler":true}'

rosmo commented 6 months ago

@xinau Would you be able to test the current master? It should not override any existing settings.