crossplane-contrib / provider-upjet-aws

Official AWS Provider for Crossplane by Upbound.
https://marketplace.upbound.io/providers/upbound/provider-aws
Apache License 2.0
141 stars 121 forks source link

Problem with NodeGroup AMI #912

Closed NikitaCloudRuntime closed 13 hours ago

NikitaCloudRuntime commented 10 months ago

What happened?

NodeGroup AMI doesn't follow version of the EKS cluster

How can we reproduce it?

As part of our investigation of possibility to switch to Crossplane for infrastructure provisioning we detected that after update of the k8s version of EKS provisioned by Crossplane, AMI version of the particular NodeGroup doesn't not update

We have a long XRD so I'll put only relevant resources Cluster

    - base:
        apiVersion: eks.aws.upbound.io/v1beta1
        kind: Cluster
        spec:
          forProvider:
            region: eu-west-1
            roleArnSelector:
              matchControllerRef: true
              matchLabels:
                role: controlplane
            vpcConfig:
              - endpointPrivateAccess: true
                endpointPublicAccess: true
      name: kubernetesCluster
      patches:
        - type: FromCompositeFieldPath
          fromFieldPath: status.subnetIds
          toFieldPath: spec.forProvider.vpcConfig[0].subnetIds
        - fromFieldPath: spec.parameters.version
          toFieldPath: spec.forProvider.version
        - type: ToCompositeFieldPath
          fromFieldPath: status.atProvider.identity[0].oidc[0].issuer
          toFieldPath: status.eks.oidc
          policy:
            fromFieldPath: Optional
        - type: ToCompositeFieldPath
          fromFieldPath: status.atProvider.identity[0].oidc[0].issuer
          toFieldPath: status.eks.oidcUri
          transforms:
            - type: string
              string:
                type: TrimPrefix
                trim: 'https://'
          policy:
            fromFieldPath: Optional
        - type: ToCompositeFieldPath
          fromFieldPath: status.atProvider.roleArn
          toFieldPath: status.eks.accountId
          transforms:
          - type: string
            string:
              type: Regexp
              regexp:
                match: 'arn:aws:iam::(\d+):.*'
                group: 1
          policy:
            fromFieldPath: Optional

NodeGroup

    - base:
        apiVersion: eks.aws.upbound.io/v1beta1
        kind: NodeGroup
        spec:
          forProvider:
            region: eu-west-1
            clusterNameSelector:
              matchControllerRef: true
            nodeRoleArnSelector:
              matchControllerRef: true
              matchLabels:
                role: nodegroup
            scalingConfig:
              - minSize: 1
                maxSize: 100
                desiredSize: 1
            instanceTypes:
              - t3.medium
      name: nodeGroup
      patches:
        - fromFieldPath: spec.parameters.nodes.count
          toFieldPath: spec.forProvider.scalingConfig[0].desiredSize
        - type: FromCompositeFieldPath
          fromFieldPath: status.subnetIds
          toFieldPath: spec.forProvider.subnetIds
        - fromFieldPath: spec.parameters.nodes.size
          toFieldPath: spec.forProvider.instanceTypes[0]
          transforms:
            - type: map
              map:
                small: t3.small
                medium: t3.medium
                large: t3.large

Initial version of the cluster was 1.25, but later we updated it to the next major release which was 1.26. We can see that the EKS cluster version has been updated, but the nodeGroup AMI is still on 1.25

apiVersion: eks.aws.upbound.io/v1beta1
kind: NodeGroup
metadata:
  annotations:
    crossplane.io/composition-resource-name: nodeGroup
    crossplane.io/external-create-succeeded: "2023-10-09T09:10:47Z"
    crossplane.io/external-name: hellofresh-eks-h482g
    upjet.crossplane.io/provider-meta: '{"e2bfb730-ecaa-11e6-8f88-34363bc7c4c0":{"create":3600000000000,"delete":3600000000000,"update":3600000000000}}'
  creationTimestamp: "2023-10-09T08:58:43Z"
  finalizers:
  - finalizer.managedresource.crossplane.io
  generateName: hellofresh-eks-
  generation: 9
  labels:
    crossplane.io/claim-name: ""
    crossplane.io/claim-namespace: ""
    crossplane.io/composite: hellofresh-eks
  name: hellofresh-eks-h482g
  ownerReferences:
  - apiVersion: aws.hellofresh.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: eks
    name: hellofresh-eks
    uid: 9d89b96f-dd1b-4887-85a9-79718a30bfb7
  resourceVersion: "12958143"
  uid: 48e6bc0e-d84a-4c59-9738-fc3dc14255e2
spec:
  deletionPolicy: Delete
  forProvider:
    amiType: AL2_x86_64
    capacityType: ON_DEMAND
    clusterName: hellofresh-eks-4xwgt
    clusterNameRef:
      name: hellofresh-eks-4xwgt
    clusterNameSelector:
      matchControllerRef: true
    diskSize: 20
    instanceTypes:
    - t3.small
    nodeRoleArn: arn:aws:iam::784668270227:role/hellofresh-eks-cbtc5
    nodeRoleArnRef:
      name: hellofresh-eks-cbtc5
    nodeRoleArnSelector:
      matchControllerRef: true
      matchLabels:
        role: nodegroup
    region: eu-west-1
    releaseVersion: 1.25.13-20231002
    scalingConfig:
    - desiredSize: 3
      maxSize: 100
      minSize: 1
    subnetIds:
    - subnet-01a0faa90ec8d8ab4
    - subnet-05d37a7fa7e2a177b
    tags:
      crossplane-kind: nodegroup.eks.aws.upbound.io
      crossplane-name: hellofresh-eks-h482g
      crossplane-providerconfig: default
    updateConfig:
    - maxUnavailable: 1
    version: "1.25"
  initProvider: {}
  managementPolicies:
  - '*'
  providerConfigRef:
    name: default
status:
  atProvider:
    amiType: AL2_x86_64
    arn: arn:aws:eks:eu-west-1:784668270227:nodegroup/hellofresh-eks-4xwgt/hellofresh-eks-h482g/26c589da-0945-f65a-a068-5596f0fc61d3
    capacityType: ON_DEMAND
    clusterName: hellofresh-eks-4xwgt
    diskSize: 20
    id: hellofresh-eks-4xwgt:hellofresh-eks-h482g
    instanceTypes:
    - t3.small
    nodeRoleArn: arn:aws:iam::784668270227:role/hellofresh-eks-cbtc5
    releaseVersion: 1.25.13-20231002
    resources:
    - autoscalingGroups:
      - name: eks-hellofresh-eks-h482g-26c589da-0945-f65a-a068-5596f0fc61d3
      remoteAccessSecurityGroupId: ""
    scalingConfig:
    - desiredSize: 3
      maxSize: 100
      minSize: 1
    status: ACTIVE
    subnetIds:
    - subnet-01a0faa90ec8d8ab4
    - subnet-05d37a7fa7e2a177b
    tags:
      crossplane-kind: nodegroup.eks.aws.upbound.io
      crossplane-name: hellofresh-eks-h482g
      crossplane-providerconfig: default
    tagsAll:
      crossplane-kind: nodegroup.eks.aws.upbound.io
      crossplane-name: hellofresh-eks-h482g
      crossplane-providerconfig: default
    updateConfig:
    - maxUnavailable: 1
      maxUnavailablePercentage: 0
    version: "1.25"
  conditions:
  - lastTransitionTime: "2023-10-09T09:10:47Z"
    reason: ReconcileSuccess
    status: "True"
    type: Synced
  - lastTransitionTime: "2023-10-09T09:13:51Z"
    reason: Available
    status: "True"
    type: Ready

According to the documentation, the version – Kubernetes version. Defaults to EKS Cluster Kubernetes version.

What environment did it happen in?

haarchri commented 10 months ago

@NikitaCloudRuntime can you add the following patch as well for NodeGroup ? so you will have EKS Version in Cluster and NodeGroup the same ? does this work for you ?

        - fromFieldPath: spec.parameters.version
          toFieldPath: spec.forProvider.version
NikitaCloudRuntime commented 10 months ago

yep, it works but shouldn't it be the default behaviour?

NikitaCloudRuntime commented 10 months ago

@haarchri it worked but now we face another issue - when we try to delete the EKS cluster we get the following error

  Conditions:
    Last Transition Time:  2023-10-10T05:40:27Z
    Reason:                ReconcileSuccess
    Status:                True
    Type:                  Synced
    Last Transition Time:  2023-10-17T06:25:26Z
    Reason:                Deleting
    Status:                False
    Type:                  Ready
    Last Transition Time:  2023-10-17T06:26:58Z
    Message:               destroy failed: deleting EKS Cluster (eks-test-hng4n): ResourceInUseException: Cluster has nodegroups attached
{
  RespMetadata: {
    StatusCode: 409,
    RequestID: "7d30250f-2076-44cb-a486-dafc6deb17c5"
  },
  ClusterName: "eks-test-hng4n",
  Message_: "Cluster has nodegroups attached",
  NodegroupName: "eks-test-frpf2"
}: 
    Reason:                DestroyFailure
    Status:                False
    Type:                  LastAsyncOperation
    Last Transition Time:  2023-10-10T05:48:58Z
    Reason:                Finished
    Status:                True
    Type:                  AsyncOperation
Events:
  Type    Reason                   Age                 From                                              Message
  ----    ------                   ----                ----                                              -------
  Normal  DeletedExternalResource  19s (x4 over 112s)  managed/eks.aws.upbound.io/v1beta1, kind=cluster  Successfully requested deletion of external resource

I think this dependency must be somehow handled by the controller

haarchri commented 10 months ago

when your nodegroup deletion is successfully done - the provider will start the removal process of cluster resource

github-actions[bot] commented 4 months ago

This provider repo does not have enough maintainers to address every issue. Since there has been no activity in the last 90 days it is now marked as stale. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

github-actions[bot] commented 2 weeks ago

This provider repo does not have enough maintainers to address every issue. Since there has been no activity in the last 90 days it is now marked as stale. It will be closed in 14 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

github-actions[bot] commented 13 hours ago

This issue is being closed since there has been no activity for 14 days since marking it as stale. If you still need help, feel free to comment or reopen the issue!