aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.42k stars 256 forks source link

nodeRoleRef & subnetRefs in eks controller - the referenced resource is missing the target field #1812

Open tomitesh opened 1 year ago

tomitesh commented 1 year ago

Describe the bug A concise description of what the bug is.

We utilize GitOps to deploy all assets by leveraging the aws-controllers-k8s framework. Specifically, we are utilizing the eks controller to facilitate the creation of EKS clusters, node groups, and add-ons.

During the process of creating a node group (with the kind: Nodegroup), we encounter an error stating "the referenced resource is missing the target field" when utilizing reference fields such as nodeRoleRef and subnetRefs.

Steps to reproduce

  1. use nodeRoleRef instead of nodeRole or
  2. use subnetRefs instead of subnets

Although the code successfully deploys, the status field displays an error message within the conditions section.

message: the referenced resource is missing the target field. resource:Role, namespace:control,
  name:flash-rancher-eks-worker, targetField:Status.ACKResourceMetadata.ARN
status: Unknown
type: ACK.ReferencesResolved
kind: Nodegroup
metadata:
  name: flash-rancher-nodegroup
  namespace: control
spec:
  amiType: AL2_x86_64
  capacityType: SPOT
  clusterName: flash-rancher
  diskSize: 20
  instanceTypes:
    - m4.xlarge
    - m5.xlarge
  name: flash-rancher-nodegroup
  nodeRoleRef:
    from:
      name: flash-rancher-eks-worker
  releaseVersion: 1.25.9-20230513
  scalingConfig:
    desiredSize: 1
    maxSize: 1
    minSize: 1
  subnets:
    - subnet-111111111111111111111
    - subnet-2222222222222222
    - subnet-333333333333333
#  subnetRefs:
#    - from:
#        name: app1-sub
#    - from:
#        name: app2-sub
#    - from:
#        name: app3-sub
  updateConfig:
    maxUnavailable: 1
  version: "1.25"

Expected outcome A concise description of what you expected to happen.

Environment development

RedbackThomson commented 1 year ago

Could you provide the description of the flash-rancher-eks-worker Role?

I have a feeling that this resource isn't properly being created - and that's why it doesn't have an ARN. Also, maybe a silly question, but do you have the iam-controller installed into the cluster?

tomitesh commented 1 year ago

Thanks @RedbackThomson for quick response.

image

Note: I have anonymized the data by replacing it with "xxxxxxxxxxxx".

if i use arn with nodeRole, it works
        nodeRole: arn:aws:iam::xxxxxxxxxxxx:role/flash-rancher-eks-worker
if i use nodeRoleRef, it's failing with error "the referenced resource is missing the target field"
       nodeRoleRef:
          from:
            name: flash-rancher-eks-worker

apiVersion: iam.services.k8s.aws/v1alpha1
kind: Role
metadata:
  annotations:
    meta.helm.sh/release-name: control-repo-cd-control-cluster-iam
    meta.helm.sh/release-namespace: control
  finalizers:
    - finalizers.iam.services.k8s.aws/Role
  labels:
    app.kubernetes.io/managed-by: Helm
  name: flash-rancher-eks-worker
  namespace: control
spec:
  assumeRolePolicyDocument: '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
  inlinePolicies:
    session-manager-logs: |-
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-session-manager-logs/*"
          },
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:GetEncryptionConfiguration",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-dev-session-manager-logs"
          }
        ]
      }
    fluentDCloudWatchLogging: |-
      {
        "Version": "2012-10-17",
        "Statement": [
            {
              "Action": "logs:DescribeLogGroups",
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:*:*",
              "Sid": "FluentDCloudWatchLoggingViewLogGroups"
            },
            {
              "Action": [
                    "logs:PutRetentionPolicy",
                    "logs:PutLogEvents",
                    "logs:DescribeLogStreams",
                    "logs:CreateLogStream",
                    "logs:CreateLogGroup"
                ],
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:/k8s/*:*",
              "Sid": "FluentDCloudWatchLoggingWrite"
            }
          ]
      }
  maxSessionDuration: 3600
  name: flash-rancher-eks-worker
  path: /
  policies:
    - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
    - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
    - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
    - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy

kindly let me know if you need more info.

RedbackThomson commented 1 year ago

Could you provide the output of kubectl get roles -n control flash-rancher-eks-worker? Specifically, the status of that object should have some sort of error - or we will see if it contains the ARN and there is some fault elsewhere.

However, just from a cursory glance at the Role, I'd double check that the Resource field within the fluentDCloudWatchLogging statement is correct - they don't look like valid ARNs.

tomitesh commented 1 year ago

output of command "kubectl get roles -n control flash-rancher-eks-worker" image

output in yaml format is as below

apiVersion: iam.services.k8s.aws/v1alpha1
kind: Role
metadata:
  annotations:
    meta.helm.sh/release-name: control-repo-cd-control-cluster-iam
    meta.helm.sh/release-namespace: control
    objectset.rio.cattle.io/id: default-control-repo-cd-control-cluster-iam-cattle-fleet-d3f1da
    services.k8s.aws/deletion-policy: retain
  creationTimestamp: "2023-05-29T20:20:02Z"
  finalizers:
  - finalizers.iam.services.k8s.aws/Role
  generation: 2
  labels:
    app.kubernetes.io/managed-by: Helm
    objectset.rio.cattle.io/hash: 50746d8429094aa76c6283a7a838da5a62dbb312
  name: flash-rancher-eks-worker
  namespace: control
  resourceVersion: "12150677"
  uid: b22ed0b2-82a4-4d20-956e-9de9e167d19c
spec:
  assumeRolePolicyDocument: '{"Version":"2012-10-17","Statement":[{"Sid":"","Effect":"Allow","Principal":{"Service":"ec2.amazonaws.com"},"Action":"sts:AssumeRole"}]}'
  inlinePolicies:
    fluentDCloudWatchLogging: |-
      {
        "Version": "2012-10-17",
        "Statement": [
            {
              "Action": "logs:DescribeLogGroups",
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:*:*",
              "Sid": "FluentDCloudWatchLoggingViewLogGroups"
            },
            {
              "Action": [
                    "logs:PutRetentionPolicy",
                    "logs:PutLogEvents",
                    "logs:DescribeLogStreams",
                    "logs:CreateLogStream",
                    "logs:CreateLogGroup"
                ],
              "Effect": "Allow",
              "Resource": "arn:aws:logs:eu-central-1:xxxxxxxxxxxx:log-group:/k8s/*:*",
              "Sid": "FluentDCloudWatchLoggingWrite"
            }
          ]
      }
    session-manager-logs: |-
      {
        "Version": "2012-10-17",
        "Statement": [
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-session-manager-logs/*"
          },
          {
            "Sid": "",
            "Effect": "Allow",
            "Action": "s3:GetEncryptionConfiguration",
            "Resource": "arn:aws:s3:::xxxxxxxxxxxx-dev-session-manager-logs"
          }
        ]
      }
  maxSessionDuration: 3600
  name: flash-rancher-eks-worker
  path: /
  policies:
  - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
  - arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
  - arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
  - arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
status:
  ackResourceMetadata:
    arn: arn:aws:iam::xxxxxxxxxxxx:role/flash-rancher-eks-worker
    ownerAccountID: "xxxxxxxxxxxx"
    region: eu-central-1
  conditions:
  - lastTransitionTime: "2023-06-06T12:05:06Z"
    message: Late initialization successful
    reason: Late initialization successful
    status: "True"
    type: ACK.LateInitialized
  - lastTransitionTime: "2023-06-06T12:05:06Z"
    message: Resource synced successfully
    reason: ""
    status: "True"
    type: ACK.ResourceSynced
  createDate: "2023-05-17T10:50:29Z"
  roleID: AROA2PXUAJR2KGGG7BHYV
  roleLastUsed:
    lastUsedDate: "2023-06-06T11:47:36Z"
    region: eu-central-1
tomitesh commented 1 year ago

questions : if i use eks controller to deploy nodegroup and

  1. specify nodeRoleRef to search role based on name, does it requires iam controller along with Role resource deployed on cluster?
  2. specify subnetRef to search subnet based on name, does it requires ec2 controller along with subnet resource deployed on cluster?
RedbackThomson commented 1 year ago

specify nodeRoleRef to search role based on name, does it requires iam controller along with Role resource deployed on cluster?

EKS controller doesn't explicitly require the IAM controller in its code, but it does require an ACK IAM Role was created using that controller and has the ACK.ResourceSynced = true condition in its status.

specify subnetRef to search subnet based on name, does it requires ec2 controller along with subnet resource deployed on cluster?

Same as IAM, nothing explicitly in the code, but the resource is required to be created by that controller.

RedbackThomson commented 1 year ago

I don't see anything wrong with your resources or your logic. The role looks well formed, and it has the conditions required for it to be referenced by the controller.

I think the only other possibility there would be for that error is that you may have tried to create the Nodegroup before the IAM controller created the Role. However, the EKS controller should retry the creation of the Nodegroup (with exponential backoff) until the Role can be referenced and then it should proceed.

ack-bot commented 11 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot commented 9 months ago

Stale issues rot after 60d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 60d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

gecube commented 8 months ago

/remove-lifecycle rotten

ack-bot commented 2 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

gecube commented 2 months ago

/remove-lifecycle stale