amazon-archives / aws-service-operator

AWS Service Operator allows you to create AWS resources using kubectl.
Apache License 2.0
733 stars 103 forks source link

Operator fails after deployment - with access denied error for sqs #173

Open screeley44 opened 5 years ago

screeley44 commented 5 years ago

I'm using alpha4 version

I keep getting this error when I deploy the aws-service-operator.yaml and not sure what config I'm missing, does anyone have any insight into this? My account has full access and I can manually create and delete SQS queues through the aws console- so I'm pretty sure this isn't a permission issue related to my account id.

[root@ip-10-0-30-249 configs (master)]# kubectl get pods --all-namespaces
NAMESPACE              NAME                                                   READY   STATUS             RESTARTS   AGE
aws-service-operator   aws-service-operator-64d649cd8f-p7qw5                  0/1     CrashLoopBackOff   1          9s
default                kube2iam-8vkzg                                         1/1     Running            0          3m
default                kube2iam-mzr2b                                         1/1     Running            0          3m
[root@ip-10-0-30-249 configs (master)]# kubectl logs aws-service-operator-64d649cd8f-p7qw5 -n aws-service-operator
time="2019-02-27T19:49:27Z" level=fatal msg="AccessDenied: Access to the resource https://sqs.us-east-1.amazonaws.com/ is denied.\n\tstatus code: 403, request id: fd1e7ab5-4ab6-5fe3-bcf5-2e45a6b1dc16"

Could it be a CloudFormation stack issue - I have an aws-service-operator-role stack created with the following WorkerArn arn:aws:iam::<hidden>:role/nodes.screeley-aws1.screeley.sysdeseng.com

One thing I've noticed, if I track the request to CreateQueue through CloudTrails I see that the request comes from some account that I don't recognize and it gets the access denied error, any idea on what I'm missing to tie kube2iam + CloudFormation stack to my operator SA?

As for the kube2iam I have deployed with several different ARNs and they all result in same behavior as above and same basic output in kube2iam logs.

#            - "--base-role-arn=arn:aws:iam::<hidden>:role/aws-service-operator"
#            - "--base-role-arn=arn:aws:iam::<hidden>:role/nodes.screeley-aws1.screeley.sysdeseng.com"
            - "--auto-discover-base-arn"   <-- this is the last way I tried when deploying kube2iam
[root@ip-10-0-30-249 configs (master)]# kubectl logs kube2iam-8vkzg
time="2019-02-27T19:45:49Z" level=info msg="base ARN autodetected, arn:aws:iam::<hidden>:role/"
time="2019-02-27T19:45:49Z" level=info msg="Listening on port 8181"

or if I don't do the auto-discover-base-arn I just get Listening on port 8181 - I don't even really care about SQS, I'm just trying to get the default operator running so I can experiment with it

christopherhein commented 5 years ago

I think when you use kube2iam and you specify the base role arn it's meant to not include the ARN name. so instead of --base-role-arn=arn:aws:iam::<hidden>:role/aws-service-operator use --base-role-arn=arn:aws:iam:::role` then the pod spec will reference the name and they are concatenated…

screeley44 commented 5 years ago

@christopherhein - thanks I'll give it a try, what I did to work around the SQS issue for now is just didn't deploy it, so I rebuilt the operator and removed the sqs yaml from the models - and the operator started right up and ran and I was able to successfully provision a few S3 buckets. I see the config map and service was generated, now I'm wondering how I use these resources in a pod? Are there any simple examples of that?

christopherhein commented 5 years ago

The difficult thing here is by removing the SQS setup you don't have the operator lifecycle, it purely creates the resources but doesn't have any knowledge of them following that. SQS was used to get the events from the stacks and when it returned a successful response it reaches out and collects the Outputs and stores them in a configmap and in the actual resource under an outputs json key.

The common example is https://aws.amazon.com/blogs/opensource/aws-service-operator-kubernetes-available/ where you'll see the pod in the example:

---
apiVersion: service-operator.aws/v1alpha1
kind: DynamoDB
metadata:
  name: dynamo-table
spec:
  hashAttribute:
    name: name
    type: S
  rangeAttribute:
    name: created_at
    type: S
  readCapacityUnits: 5
  writeCapacityUnits: 5

---
apiVersion: v1
kind: Service
metadata:
  name: frontend
spec:
  selector:
    app: frontend
  ports:
  - port: 80
    targetPort: http-server
    name: http
  type: LoadBalancer

---
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: frontend
  labels:
    app: frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: christopherhein/dynamoapp:latest
        imagePullPolicy: Always
        env:
        - name: TABLE_NAME
          valueFrom:
            configMapKeyRef:
              name: dynamo-table
              key: tableName
        resources:
          requests:
            memory: "512m"
            cpu: "512m"
        ports:
        - name: http-server
          containerPort: 8080

You'll notice in this that the pod uses a configmap which isn't declared that configmap is created by the operator following successful messages from the CFN stacks.

screeley44 commented 5 years ago

ok, so without SQS the S3 operator did successfully create my bucket and config map, just as a reference, but I'm not sure if you have a sample app similar to the dynamoapp above to test out the S3 kube stuff?

[root@ip-10-0-30-249 ~]# kubectl describe cm screeley-test-aws-operator
Name:         screeley-test-aws-operator
Namespace:    default
Labels:       <none>
Annotations:  <none>

Data
====
bucketARN:
----
arn:aws:s3:::screeley-test-aws-operator
bucketName:
----
screeley-test-aws-operator
bucketURL:
----
screeley-test-aws-operator.s3-us-east-1.amazonaws.com
region:
----
us-east-1
serviceName:
----
screeley-test-aws-operator
websiteURL:
----
http://screeley-test-aws-operator.s3-website-us-east-1.amazonaws.com
Events:  <none>
christopherhein commented 5 years ago

Wow, that's fantastic, and I'm not sure how that's possible LOL. But either way amazing. Might be some other SQS work that is built-in. I do have an S3 example, we're in the process of reupdating the workshop to include it but this sample uses dynamo and S3 together, it requires a couple manual steps I want to remove such as getting the load balancer address from the service but you can step through the steps here - https://eksworkshop.com/pr-62/operator/

marcindulak commented 5 years ago

Also getting the same problem with https://github.com/awslabs/aws-service-operator/releases/tag/v0.0.1-alpha4

kubectl -n aws-service-operator logs aws-service-operator-56c7c574dc-xwkdj
time="2019-06-27T14:15:12Z" level=fatal msg="AccessDenied: Access to the resource https://sqs.eu-west-1.amazonaws.com/ is denied.\n\tstatus code: 403, request id: 31df0c03-a9b5-524c-8c18-598d7805fa1f"

kube2iam is deployed with helm, and uses base-role-arn=arn:aws:iam::XXXXXXXXXXXX:role/ and extraArgs.default-role=kube2iam (see https://github.com/helm/charts/blob/eadd6157eba1f7b387f254583213a6515810f42d/stable/kube2iam/values.yaml#L2-L3)

kubectl -n kube-system logs kube2iam-4g4ts 
time="2019-06-27T18:34:56Z" level=info msg="Listening on port 8181"

Update: the problem seems to be due to an improperly setup kube2iam. iptables: true and the interface corresponding to the used cni need to be set for kube2iam. There is an example that can be modified to verify whether the iam.amazonaws.com/role: aws-service-operator is assumed by the pod while testing, e.g. a private s3 bucket access https://github.com/jtblin/kube2iam/issues/58#issuecomment-286861430

On the other hand, I had no luck with:

rm -f models/sqsqueue.yaml
go get -u github.com/jteeuwen/go-bindata/...
make install-aws-codegen
make rebuild

The resulting image, when deployed still wanted to create an sqs queue.