crossplane-contrib / provider-upjet-aws

Official AWS Provider for Crossplane by Upbound.
https://marketplace.upbound.io/providers/upbound/provider-aws
Apache License 2.0
142 stars 121 forks source link

[Bug]: Nodegroup late-initializes scaling config provided only in InitProvider #1469

Closed turkenh closed 2 weeks ago

turkenh commented 2 weeks ago

Is there an existing issue for this?

Affected Resource(s)

Resource MRs required to reproduce the bug

A NodeGroup with only spec.initProvider.scalingConfig (desired/initial/max size) but no spec.forProvider.scalingConfig.

Steps to Reproduce

  1. Create a NodeGroup with only spec.initProvider.scalingConfig but no spec.forProvider.scalingConfig
  2. Get the resource back once it is ready
  3. Check whether there is a spec.forProvider.scalingConfig

What happened?

spec.forProvider.scalingConfig late initialized with the values we provided in spec.initProvider.scalingConfig

This result in conflicts with the auto scalers where both are trying to desired node size.

Relevant Error Output Snippet

N/A

Crossplane Version

v1.16.0-up.1

Provider Version

v1.8.0

Kubernetes Version

No response

Kubernetes Distribution

No response

Additional Info

No response

haarchri commented 2 weeks ago

what we doing today looks like:

...
spec:
  initProvider:
    scalingConfig:
      - desiredSize: 5
  forProvider:
    scalingConfig:
      - minSize: 1
        maxSize: 10
mergenci commented 2 weeks ago

As far as I know, reported behavior is the intended behavior. In such cases, LateInitialize should be excluded from managementPolicies. initProvider design doc specifically gives EKS NodeGroup as an example for when LateInitialize management policy should be omitted in addition to specifying desiredSize in initProvider.

That being said, I must add that I'm not satisfied with this behavior. initProvider gives the illusion of being equivalent to ignore_changes in Terraform, yet it behaves differently. I've been bitten multiple times by this behavior and I've seen community members being puzzled by it.

I would be happy to contribute to efforts in changing initProvider behavior before it is promoted to GA.

mergenci commented 2 weeks ago

We discussed this issue with @turkenf and @sergenyalcin off-channel. @turkenf stated that we had deviated from the standard way of disabling LateInitialize management policy in favor of disabling the late initialization in resource configuration when appropriate. The reason was the problems we had with GCP's NodePool resource. See https://github.com/crossplane-contrib/provider-upjet-gcp/pull/600 for a recent instance.

@sergenyalcin reminded us of https://github.com/crossplane/upjet/pull/407, which is a backwards-compatible way of disabling late initialization in resource configuration.