aws / karpenter-provider-aws

Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
https://karpenter.sh
Apache License 2.0
6.15k stars 849 forks source link

chore: Deny http access aws sqs queue #6395

Closed prad9192 closed 6 days ago

prad9192 commented 1 week ago

Fixes https://github.com/aws/karpenter-provider-aws/issues/5901

Description Appended a policy statement to the existing queue policy to effectively deny any actions (sqs:*) on the specified SQS queue when the request is not using a secure transport (HTTP instead of HTTPS).

How was this change tested?

  1. Deployed the resources to a local aws cluster.
    
    aws
    cloudformation deploy --stack-name "Karpenter-${CLUSTER_NAME}" --capabilities CAPABILITY_NAMED_IAM --parameter-overrides "ClusterName=${CLUSTER_NAME}" --template-file "website/content/en/v0.37/getting-started/getting-started-with-karpenter/cloudformation.yaml"

Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - Karpenter-pash-karpenter-demo

2. Policy is updated as below.

aws sqs get-queue-attributes --queue-url https://sqs.{AWS_REGION}.amazonaws.com/{AWS_ACCOUNT_ID}/pash-karpenter-demo --attribute-names Policy | jq '.Attributes.Policy | fromjson | walk(if type == "string" and test("arn:aws:sqs:{AWS_REGION}:([0-9]+):pash-karpenter-demo") then sub("arn:aws:sqs:{AWS_REGION}:[0-9]+:pash-karpenter-demo"; "arn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:pash-karpenter-demo") else . end)'

{ "Version": "2008-10-17", "Id": "EC2InterruptionPolicy", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "sqs.amazonaws.com", "events.amazonaws.com" ] }, "Action": "sqs:SendMessage", "Resource": "arn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:pash-karpenter-demo" }, { "Sid": "DenyHTTP", "Effect": "Deny", "Principal": "", "Action": "sqs:", "Resource": "arn:aws:sqs:{AWS_REGION}:{AWS_ACCOUNT_ID}:pash-karpenter-demo", "Condition": { "Bool": { "aws:SecureTransport": "false" } } } ] }


3. If there are any spot interruptions new instances are bootstrapped.

k get nodeclaims.karpenter.sh -o wide NAME TYPE ZONE NODE READY AGE CAPACITY NODEPOOL NODECLASS general-qkdhk t3.small {AWS_REGION-ZONE} ip-x-x-x-x.{AWS_REGION-ZONE}.compute.internal True 23h spot general general



**Does this change impact docs?**
- [ ] Yes, PR includes docs updates <!-- docs must be added to /preview to be included in future version releases -->
- [ ] Yes, issue opened: # <!-- issue number -->
- [x] No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
netlify[bot] commented 1 week ago

Deploy Preview for karpenter-docs-prod canceled.

Name Link
Latest commit 7133b17254ab9cf505056aa0bf0f9ded827fb201
Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/667b679ab902a9000892680a
jonathan-innis commented 1 week ago

@prad9192 I made a comment on the initial issue that was opened. I'm not 100% convinced that I see the value in opening up the ability for users to specify the policy through our Getting Started template.

Mind providing me a little bit of insight into your use-case?

prad9192 commented 1 week ago

Hey @jonathan-innis ,

We are trying to make sure that AWS security best practices are covered for SQS policy.

SQS security best practicesSQS security best practicesSQS security best practices

The whole intention is to solve security gap above, but I see a point in the comments you left on https://github.com/aws/karpenter-provider-aws/issues/5901. We can wait for the Original user to respond before taking this forward.

Reference: https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2975

jonathan-innis commented 1 week ago

The whole intention is to solve security gap above

Got it. I think I get this now. Change seems reasonable to me. Let's run a round of E2E tests against it and ensure that everything passes 👍🏼

github-actions[bot] commented 1 week ago

Snapshot successfully published to oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:0-05e0b1838b461b2acaecfe5a7f5a176e184990dd. To install you must login to the ECR repo with an AWS account:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-05e0b1838b461b2acaecfe5a7f5a176e184990dd" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait
coveralls commented 1 week ago

Pull Request Test Coverage Report for Build 9646142973

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details


Totals Coverage Status
Change from base Build 9641526953: 0.0%
Covered Lines: 5551
Relevant Lines: 6730

💛 - Coveralls
coveralls commented 1 week ago

Pull Request Test Coverage Report for Build 9671788140

Details


Files with Coverage Reduction New Missed Lines %
pkg/providers/amifamily/ami.go 1 90.56%
<!-- Total: 1 -->
Totals Coverage Status
Change from base Build 9669988716: -0.01%
Covered Lines: 5550
Relevant Lines: 6730

💛 - Coveralls
prad9192 commented 6 days ago

/status

jmdeal commented 6 days ago

Can we also update the CloudFormation Reference Docs with the new statement? It would be good to reference out to the referenced SQS best practices to explain this default. We can do this in a followup.

prad9192 commented 5 days ago

The whole intention is to solve security gap above

Got it. I think I get this now. Change seems reasonable to me. Let's run a round of E2E tests against it and ensure that everything passes 👍🏼


helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-05e0b1838b461b2acaecfe5a7f5a176e184990dd" --namespace "kube-system" --create-namespace \
--set "settings.clusterName=${CLUSTER_NAME}" \
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi \
--set controller.resources.limits.cpu=1 \
--set controller.resources.limits.memory=1Gi \
--wait
Pulled: 021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:0-05e0b1838b461b2acaecfe5a7f5a176e184990dd
Digest: sha256:f3c9ae8e6f490e28b85af7b761b16029d577d345a7376de7001b4dc133307a2b
Release "karpenter" has been upgraded. Happy Helming!
NAME: karpenter
LAST DEPLOYED: Wed Jun 26 05:17:40 2024
NAMESPACE: kube-system

k get po karpenter-6f98c4f678-c8bh9 -o yaml | yq '.spec.containers[].image' 021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/controller:05e0b1838b461b2acaecfe5a7f5a176e184990dd@sha256:7de44eab823966a7d70d8d10e8211dc465054647cb5f483f80e26029ce6d5c0f

Ran 17 of 17 Specs in 1904.108 seconds SUCCESS! -- 17 Passed | 0 Failed | 0 Pending | 0 Skipped --- PASS: TestAMI (1904.11s) W0626 06:19:54.865192 36321 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232: watch of v1beta1.NodeClaim ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding PASS W0626 06:19:54.865219 36321 reflector.go:470] pkg/mod/k8s.io/client-go@v0.30.2/tools/cache/reflector.go:232: watch of v1.PartialObjectMetadata ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding ok github.com/aws/karpenter-provider-aws/test/suites/ami 1905.198s



cc: @jonathan-innis 
prad9192 commented 5 days ago

Can we also update the CloudFormation Reference Docs with the new statement? It would be good to reference out to the referenced SQS best practices to explain this default. We can do this in a followup.

This done here - https://github.com/aws/karpenter-provider-aws/pull/6416