aws-controllers-k8s / community

AWS Controllers for Kubernetes (ACK) is a project enabling you to manage AWS services from Kubernetes
https://aws-controllers-k8s.github.io/community/
Apache License 2.0
2.39k stars 253 forks source link

Latest EC2 Controller Forces `tags` to be in CRs #1493

Open acornett21 opened 1 year ago

acornett21 commented 1 year ago

Describe the bug With the latest release of the EC2 controller, tags were added to the VPC CR. With this addition, the controller now fails if a user does not give the tags attribute when applying the CR.

Steps to reproduce

  1. Install the operator on OpenShift (I assume the same issue is on k8s)
  2. Apply a new VPC CR with no tags (oc apply -f or via the web ui)
    oc apply -f - <<'EOF'
    apiVersion: ec2.services.k8s.aws/v1alpha1
    kind: VPC
    metadata:
    name: rosa-rds-vpc
    namespace: ack-workspace
    spec:
    cidrBlocks:
    - 100.68.0.0/18
    enableDNSHostnames: true
    enableDNSSupport: true
    EOF
  3. Tail the logs of the controller pod, and see the below failure.
    2022-09-27T14:56:47.379Z    ERROR   controller.vpc  Reconciler error    {"reconciler group": "ec2.services.k8s.aws", "reconciler kind": "VPC", "name": "rosa-rds-vpc", "namespace": "ack-workspace", "error": "InvalidRequest: The request received was invalid.\n\tstatus code: 400, request id: 31966f27-02b3-4481-ad3d-5193706ebd29"}

Expected outcome I'd expect the above CR to work properly since tags are supposed to be optional. When tags are added to the above CR, the controller reconciles the resource.

ie

apiVersion: ec2.services.k8s.aws/v1alpha1
kind: VPC
metadata:
  name: rosa-rds-vpc
  namespace: ack-workspace
spec:
  cidrBlocks:
    - 100.68.0.0/18
  enableDNSHostnames: true
  enableDNSSupport: true
  tags:
    - key: name
      value: rosa-rds

It would be great if the tests for this controller were also update to have some positive and negative tests, ie with and without the tags field present in the CR. To me it looks like all current tests have tags present.

Environment

brycahta commented 1 year ago

Hi @acornett21 , I'm testing on my EKS cluster with ec2-controller v0.0.20 and cannot reproduce the issue. I'm not too familiar with OpenShift-- is the Operator using the latest controller version?


[29/09/22 12:30:40] ➜  test-yaml cat vpc-test.yaml
apiVersion: ec2.services.k8s.aws/v1alpha1
kind: VPC
metadata:
  name: vpc-ack-test
spec:
  cidrBlocks:
  - 10.0.0.0/16
  enableDNSSupport: true
  enableDNSHostnames: true%

[29/09/22 12:30:44] ➜  test-yaml kubectl apply -f vpc-test.yaml
vpc.ec2.services.k8s.aws/vpc-ack-test created

[29/09/22 12:30:56] ➜  test-yaml kubectl describe vpcs
Name:         vpc-ack-test
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  ec2.services.k8s.aws/v1alpha1
Kind:         VPC
Metadata:
  Creation Timestamp:  2022-09-29T17:30:56Z
  Finalizers:
    finalizers.ec2.services.k8s.aws/VPC
  Generation:  2
<truncated for readability...>
Spec:
  Cidr Blocks:
    10.0.0.0/16
  Enable DNS Hostnames:  true
  Enable DNS Support:    true
  Instance Tenancy:      default
Status:
  Ack Resource Metadata:
    Owner Account ID:  <ID>
    Region:            us-west-2
  Cidr Block Association Set:
    Association ID:  vpc-cidr-assoc-<ID>
    Cidr Block:      10.0.0.0/16
    Cidr Block State:
      State:  associated
  Conditions:
    Last Transition Time:  2022-09-29T17:30:58Z
    Message:               Resource synced successfully
    Reason:
    Status:                True
    Type:                  ACK.ResourceSynced
  Dhcp Options ID:         dopt-<ID>
  Is Default:              false
  Owner ID:                <ID>
  State:                   available
  Vpc ID:                  vpc-<ID>
brycahta commented 1 year ago

It would be great if the tests for this controller were also update to have some positive and negative tests, ie with and without the tags field present in the CR. To me it looks like all current tests have tags present.

+1 , improvements to e2e tests are ongoing; I'll be sure to include this.

acornett21 commented 1 year ago

Hi @acornett21 , I'm testing on my EKS cluster with ec2-controller v0.0.20 and cannot reproduce the issue. I'm not too familiar with OpenShift-- is the Operator using the latest controller version?

The OpenShift operator is packaged on release of the controller code, so whatever is in the controller image for the runtime version, it's the same in OpenShift as k8s. Are any flags/params to pass to the controller to get more logs other than the 400 bad request?

acornett21 commented 1 year ago

@brycahta How are you deploying into your k8s cluster? Helm or k8s manifests?

brycahta commented 1 year ago

The OpenShift operator is packaged on release of the controller code, so whatever is in the controller image for the runtime version, it's the same in OpenShift as k8s. Are any flags/params to pass to the controller to get more logs other than the 400 bad request?

If you have the code locally, I usually use this go command:

go run ./cmd/controller/main.go \
--aws-region=eu-west-2 \
--enable-development-logging \
--log-level=debug

If you're installing from helm:

helm install --create-namespace -n $ACK_SYSTEM_NAMESPACE ack-$SERVICE-controller \
  oci://public.ecr.aws/aws-controllers-k8s/$SERVICE-chart --version=$RELEASE_VERSION --set=aws.region=$AWS_REGION
  --set=log.level=info --set=log.enable_development_logging=true
brycahta commented 1 year ago

How are you deploying into your k8s cluster? Helm or k8s manifests?

I'm using k8s manifest. I have the code checked out, use go run command above, then kubectl apply -f vpc-test.yaml (manifest in first comment)

acornett21 commented 1 year ago

Here is some more info for the requests that end up in AWS...I think tagSpecificationSet is always being included in the request and on OpenShift it would be empty

 "requestParameters": {
        "cidrBlock": "100.68.0.0/18",
        "tagSpecificationSet": {
            "items": [
                {
                    "resourceType": "vpc",
                    "tags": [
                        {}
                    ]
                }
            ]
        }
    },

and on k8s it is not

        "requestParameters": {
        "cidrBlock": "100.68.0.0/18",
        "tagSpecificationSet": {
            "items": [
                {
                    "resourceType": "vpc",
                    "tags": [
                        {
                            "key": "services.k8s.aws/controller-version",
                            "value": "ec2-v0.0.20"
                        },
                        {
                            "key": "services.k8s.aws/namespace",
                            "value": "ack-system"
                        }
                    ]
                }
            ]
        }
    },

since the k8 install is would have the below values here feed into the controller, by default

where in openshift this would not be present. here

So when the controller goes to look for those. Is the ACK_RESOURCE_TAGS ENV that is used by default in helm/k8s manifest install, is that a REQUIRED argument for all controllers now? That might be more of a question for @RedbackThomson or @jaypipes

brycahta commented 1 year ago

@acornett21 appreciate the help with deep diving this. I'm looking more on my end and runtime::GetDefaultTags seems like it can handle nil config values for ResourceTags. However, the configuring-default-tags documentation, implies that the config needs to be present whether you want default tags or not:

To remove the ACK default tags, set the resourceTags Helm value to be {} inside values.yaml file or use --set 'resourceTags={}' during helm chart installation.

@jaypipes @RedbackThomson @A-Hilaly would y'all be able to help us clarify the expected behavior?

vijtrip2 commented 1 year ago

I will take a look at complete issue later today. But i can clarify the default behavior below:

However, the configuring-default-tags documentation, implies that the config needs to be present whether you want default tags or not

If you do not provide any command-line override for resource tags during helm installations, these defaults from helm values.yaml are used. ACK team wanted this to be default controller behavior for helm installations.

A parameter during helm installation is only required if you wish to use no tags "{}" or want to override the default tags.

acornett21 commented 1 year ago

This is the confusing part for me, if we say that CR.Tags is optional then give a default value, then the field is not optional it's a required field with a default value.

If we need a default value in OpenShift as well, the olm template would need to change and probably the code that way the default values get populated at build/release time of a controller. (Assuming this a concept in all controllers)

vijtrip2 commented 1 year ago

Good feedback, @acornett21 . During the release of this feature i did not test with open-shift, and forgot to take it's behavior and templates into account. I primarily focussed on customer behavior with helm installations. It's a good learning.

@A-Hilaly , will it be it alright if i explain the issue to you offline and you can own the fix?

acornett21 commented 1 year ago

@vijtrip2 No worries The template update would look something like this

- op: add
          path: '/spec/template/spec/containers/0/env/1'
          value:
            name: ACK_RESOURCE_TAGS
            value: "services.k8s.aws/controller-version=%CONTROLLER_SERVICE%-%CONTROLLER_VERSION%,services.k8s.aws/namespace=%K8S_NAMESPACE%"

but not sure how these values would be populated (ie code olm.go) or during the build process (olm sh files). Let me know if you all need more help.

brycahta commented 1 year ago

root cause for the invalid request is hook code in ec2-controller, specifically updateTagSpecificationsInCreateRequest:

input.TagSpecifications = []*svcsdk.TagSpecification{&desiredTagSpecs}

This is always assigned even if desiredTagSpecs is empty. The fix will be to update all resources' hook code to move this assignment into if r.ko.Spec.Tags != nil check as well as setting input.TagSpecifications = nil at the start of the function.

ack-bot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

a-hilaly commented 1 year ago

Hey @acornett21 was this issue fixed, or shall we leave this open?

acornett21 commented 1 year ago

@A-Hilaly I didn't realize a PR went in for this. I'll retest this week.

acornett21 commented 1 year ago

@A-Hilaly This is still a bug, I still get the same error as reported above. This was tested with the 1.0.1 version of the controller.

a-hilaly commented 1 year ago

thank you @acornett21 ! 🙇

ack-bot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale

ack-bot commented 1 year ago

Stale issues rot after 60d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 60d of inactivity. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle rotten

ack-bot commented 10 months ago

Rotten issues close after 60d of inactivity. Reopen the issue with /reopen. Provide feedback via https://github.com/aws-controllers-k8s/community. /close

ack-prow[bot] commented 10 months ago

@ack-bot: Closing this issue.

In response to [this](https://github.com/aws-controllers-k8s/community/issues/1493#issuecomment-1802889367): >Rotten issues close after 60d of inactivity. >Reopen the issue with `/reopen`. >Provide feedback via https://github.com/aws-controllers-k8s/community. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
acornett21 commented 10 months ago

/reopen

ack-prow[bot] commented 10 months ago

@acornett21: Reopened this issue.

In response to [this](https://github.com/aws-controllers-k8s/community/issues/1493#issuecomment-1802890218): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ack-bot commented 4 months ago

Issues go stale after 180d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 60d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Provide feedback via https://github.com/aws-controllers-k8s/community. /lifecycle stale