aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
461 stars 134 forks source link

Getting 'AccessDeniedException' #177

Open jjorissen52 opened 3 years ago

jjorissen52 commented 3 years ago

Duplicate of https://github.com/aws/aws-for-fluent-bit/issues/155, opening a new issue because the resolution of that issue was not shared.

Docker Version: amazon/aws-for-fluent-bit:2.10.0 Kubernetes Version: GitVersion:"v1.19.8-eks-96780e"

Logs (cluster-name as a stand in for actual cluster name):

[2021/05/12 19:36:11] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group /aws/containerinsights/cluster-name/application
[2021/05/12 19:36:11] [error] [output:cloudwatch_logs:cloudwatch_logs.0] CreateLogGroup API responded with error='AccessDeniedException'
[2021/05/12 19:36:11] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Failed to create log group
[2021/05/12 19:36:11] [ info] [output:cloudwatch_logs:cloudwatch_logs.1] Creating log group /aws/containerinsights/cluster-name/dataplane
[2021/05/12 19:36:11] [error] [output:cloudwatch_logs:cloudwatch_logs.1] CreateLogGroup API responded with error='AccessDeniedException'
[2021/05/12 19:36:11] [error] [output:cloudwatch_logs:cloudwatch_logs.1] Failed to create log group
[2021/05/12 19:36:11] [ info] [output:cloudwatch_logs:cloudwatch_logs.2] Creating log group /aws/containerinsights/cluster-name/host
[2021/05/12 19:36:11] [error] [output:cloudwatch_logs:cloudwatch_logs.2] CreateLogGroup API responded with error='AccessDeniedException'
[2021/05/12 19:36:11] [error] [output:cloudwatch_logs:cloudwatch_logs.2] Failed to create log group

We are seeing this error with CloudWatch and CloudWatch Logs full access applied. The role with these permissions has been applied by modifying the yaml from the fluent-bit quickstart like so:

# create amazon-cloudwatch namespace
apiVersion: v1
kind: Namespace
metadata:
  name: amazon-cloudwatch
  labels:
    name: amazon-cloudwatch
---

# create cwagent service account and role binding
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cloudwatch-agent
  namespace: amazon-cloudwatch
  + annotations:
  +  eks.amazonaws.com/role-arn: arn:aws:iam::<account_id>:role/dev_eks_role
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cloudwatch-agent-role
rules:
  - apiGroups: [""]
    resources: ["pods", "nodes", "endpoints"]
    verbs: ["list", "watch"]
  - apiGroups: ["apps"]
    resources: ["replicasets"]
    verbs: ["list", "watch"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["list", "watch"]
  - apiGroups: [""]
    resources: ["nodes/proxy"]
    verbs: ["get"]
  - apiGroups: [""]
    resources: ["nodes/stats", "configmaps", "events"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["configmaps"]
    resourceNames: ["cwagent-clusterleader"]
    verbs: ["get","update"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cloudwatch-agent-role-binding
subjects:
  - kind: ServiceAccount
    name: cloudwatch-agent
    namespace: amazon-cloudwatch
roleRef:
  kind: ClusterRole
  name: cloudwatch-agent-role
  apiGroup: rbac.authorization.k8s.io

Any advice would be greatly appreciated.

DrewZhang13 commented 3 years ago

what's the clusterRole and ClusterRoleBinding in your yaml file?

jjorissen52 commented 3 years ago

@DrewZhang13 I have updated my original question to include those sections.

jjorissen52 commented 3 years ago

Attaching this policy to the Role associated with the EC2 instance where the worker nodes are hosted resolved the issue.

image

PettitWesley commented 3 years ago

Hmmm yeah, adding a CW policy to the node group role is a required step.

We should probably add that to the getting started guide... I'll try to get that done...

jjorissen52 commented 3 years ago

It can be found in the prerequisites for the Amazon CloudWatch EKS Insights Quickstart .

image

GeiserX commented 2 years ago

I was facing the same problem until I attached the role to the nodes. But I would like not to attach it this way. I used the two placeholders available in the values.yaml file to create a role (In annotations and in cloudWatch.roleArn) but the ServiceAccount remained unchanged, and I suppose it needed to be changed. It changed some other configurations of the chart.

mersedsv commented 2 years ago

@DrumSergio On my side is not working with the service accounts as well. I had a full access (literally of everything) added to the service account for the purpose of the testing and was not able to push the account.

Documentation indicates that this is achievable using service account for eks cluster, but it seems its not working properly.

Adding CloudWatchAgentServerPolicy to the node role fixed everything and things are working.

chihkaiyu commented 2 years ago

I also had this problem. CloudWatchAgent could send metrics but couldn't send logs. Here are the log from fluent-bit pod.

[2022/03/07 09:44:57] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group /aws/containerinsights/eks-testnet/application
[2022/03/07 09:44:57] [error] [aws_credentials] Could not read shared credentials file /root/.aws/credentials
[2022/03/07 09:44:57] [error] [aws_credentials] Failed to retrieve credentials for AWS Profile default
[2022/03/07 09:44:57] [ warn] [aws_credentials] No cached credentials are available and a credential refresh is already in progress. The current co-routine will retry.
[2022/03/07 09:44:57] [error] [signv4] Provider returned no credentials, service=logs
[2022/03/07 09:44:57] [error] [aws_client] could not sign request
[2022/03/07 09:44:57] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Failed to create log group

I already attached CloudWatchAgentServerPolicy to EKS node group IAM role. I think this should be enough to send the logs? If it isn't, the metrics shouldn't be able to be sent either?

chihkaiyu commented 2 years ago

Upgrade fluent bit image from 2.10.0 which is the default version in quick start yaml to 2.21.6 solve my problem.

hakuna-matatah commented 2 years ago

Upgrade fluent bit image from 2.10.0 which is the default version in quick start yaml to 2.21.6 solve my problem.

upgrading to 2.21.6 leading me to another issue - [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS

hakuna-matatah commented 2 years ago

Upgrade fluent bit image from 2.10.0 which is the default version in quick start yaml to 2.21.6 solve my problem.

upgrading to 2.21.6 leading me to another issue - [error] [filter:aws:aws.3] Could not retrieve ec2 metadata from IMDS

It looks like updating the image to 2.23.4 and changing the filter to use imds_version v2 instead of v1 solved the errors.

IOW, we should be updating this file here - which is being by Amazon docs

PettitWesley commented 2 years ago

@hakuna-matatah I submitted a change to fix the version: https://github.com/aws-samples/amazon-cloudwatch-container-insights/pull/95

magaldima commented 1 year ago

I'm seeing the same issue - namely that using the IAM roles for service accounts on EKS does not seem to be working whereas the documentation makes it seem like it should. I don't want to add the IAM policy to the EKS nodes.

You must also grant IAM permissions to enable your Amazon EKS worker nodes to send metrics and logs to CloudWatch. There are two ways to do this: Attach a policy to the IAM role of your worker nodes. This works for both Amazon EKS clusters and other Kubernetes clusters. Use an IAM role for service accounts for the cluster, and attach the policy to this role. This works only for Amazon EKS clusters.

I'm using the stable image tag and I'm seeing this in the container logs:

[2023/05/15 18:04:57] [error] [aws_client] auth error, refreshing creds 92 [2023/05/15 18:04:57] [error] [aws_credentials] Shared credentials file /root/.aws/credentials does not exist

PettitWesley commented 1 year ago

@magaldima are you using the container insights daemonset? I think the docs direct you by default to an option that uses instance/node role.

Enable debug logging and check this too: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#credential-chain-resolution-issues