RADAR-base / RADAR-K8s-Infrastructure

Streamline RADAR-base K8s deployment with cloud infrastructure provisioning
Apache License 2.0
2 stars 1 forks source link

Getting credentials: exec: executable aws failed with exit code 255 #5

Open keyvaann opened 9 months ago

keyvaann commented 9 months ago

Running terraform apply in the cluster directory fails with this error message:

│ Error: Have got the following error while validating the existence of the ConfigMap "aws-auth": Get "https://xxx.gr7.eu-west-2.eks.amazonaws.com/api/v1/namespaces/kube-system/configmaps/aws-auth": getting credentials: exec: executable aws failed with exit code 255
│ 
│   with module.eks.kubernetes_config_map_v1_data.aws_auth[0],
│   on .terraform/modules/eks/main.tf line 553, in resource "kubernetes_config_map_v1_data" "aws_auth":
│  553: resource "kubernetes_config_map_v1_data" "aws_auth" {

Upon rerunning terraform apply it appears that it's failing to create this resource:

  # module.eks.kubernetes_config_map_v1_data.aws_auth[0] will be created
  + resource "kubernetes_config_map_v1_data" "aws_auth" {
      + data          = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::xxx:role/dmz-eks-node-group-xxx"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws:iam::xxx:role/worker-eks-node-group-xxx"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:masters"
                  "rolearn": "arn:aws:iam::xxx:role/connect-prod-radar-base-admin-role"
                  "username": "xxx-radar-base-admin-role"
            EOT
          + "mapUsers"    = jsonencode([])
        }
      + field_manager = "Terraform"
      + force         = true
      + id            = (known after apply)

      + metadata {
          + name      = "aws-auth"
          + namespace = "kube-system"
        }
    }
baixiac commented 9 months ago

Which version of aws-cli have you installed? Try upgrading to a more recent version and see if that helps.

https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html

keyvaann commented 9 months ago

Updated my aws-cli version to 2.14.5 and getting the same error. I'm using Terraform v1.6.5.

baixiac commented 9 months ago

Another thing is we assume the users have got enough permissions before applying the template. Can you check if your IAM user/role has got eks:DescribeCluster and eks:UpdateClusterConfig? I can see your cluster was created and does the following run successfully?

aws eks --region <region> update-kubeconfig --name <eks-cluster-name>

keyvaann commented 9 months ago

Yes that command runs successfully and I'm admin in the AWS account so access probably isn't an issue.

baixiac commented 9 months ago

Here are the versions installed on my machine and hope they will help to narrow down the problem on your side:

Terraform v1.4.2

+ provider registry.terraform.io/hashicorp/aws v5.0.1
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.24.0
+ provider registry.terraform.io/hashicorp/time v0.9.2
+ provider registry.terraform.io/hashicorp/tls v4.0.5
baixiac commented 9 months ago

I have tested both Terraform v1.5.7 and v1.6.5 and there were no errors. I think it is time for your site to check the API logs on AWS CloudTrail. Feel free to share any redacted logs here.

Terraform v1.5.7

+ provider registry.terraform.io/hashicorp/aws v5.0.1
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.24.0
+ provider registry.terraform.io/hashicorp/time v0.10.0
+ provider registry.terraform.io/hashicorp/tls v4.0.5
Terraform v1.6.5

+ provider registry.terraform.io/hashicorp/aws v5.0.1
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.24.0
+ provider registry.terraform.io/hashicorp/time v0.10.0
+ provider registry.terraform.io/hashicorp/tls v4.0.5
keyvaann commented 9 months ago

Here are my versions:

Terraform v1.6.5
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v5.0.1
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/kubernetes v2.24.0
+ provider registry.terraform.io/hashicorp/time v0.9.2
+ provider registry.terraform.io/hashicorp/tls v4.0.5

According to terraform-aws-modules/terraform-aws-eks#2009, looks like this issue happens when you have multiple profiles in your AWS config. Adding a profile setting helped to resolve the issue, will make a PR later.

baixiac commented 9 months ago

Oh cool. That's why I think using env vars of AWS_*s is more explicit than using profiles.

baixiac commented 3 months ago

My confession on losing track of the issue. Looks like this can be closed following your update on README? @keyvaann

keyvaann commented 3 months ago

I think #13 needs to be merged for this issue to be fixed.

baixiac commented 3 months ago

Alright, can you please rebase and test it on your site? Happy to get it merged and clear this issue.

keyvaann commented 2 months ago

I can't do that at the moment since I'm busy with a few projects, hopefully in a couple of weeks I'll have the time for it.