AzBuilder / terrakube

Open source IaC Automation and Collaboration Software.
https://docs.terrakube.io
Apache License 2.0
520 stars 44 forks source link

Allow using remote execution of uncommitted code locally #1015

Closed igorbrites closed 4 months ago

igorbrites commented 4 months ago

Bug description 🐞

I've read the issue https://github.com/AzBuilder/terrakube/issues/596, though I think my problem is different. When using Terraform Cloud, if you run plans using local (uncommitted) code, this code is somehow uploaded to TFC runners and planned as needed even if the workspace has a VCS connection configured.

Right now I'm evaluating the change to Terrakube (workspaces and modules), and I need to test the connection using Terrakube's OIDC provider created using this code, but if I run a plan locally with the backend changes, it still tries to run the code from my main branch.

I can't set up multiple Terraform tokens on GitHub Actions right now, and I haven't move the modules there yet, so I need to test the AWS authentication on the executors, but with my local code. How can we achieve it?

Steps to reproduce

Expected behavior

Code gets pushed to the executors and the plan occurs as normal.

Example repository

No response

Anything else?

No response

alfespa17 commented 4 months ago

I think this was fixed here:

https://github.com/AzBuilder/terrakube/issues/661

In VCS workspace you just need to add locally your backend.tf like the following and it will run a speculative plan

terraform {
  cloud {
    organization = "simple"
    hostname = "8080-azbuilder-terrakube-srvv8ms68ej.ws-us107.gitpod.io"

    workspaces {
      name = "simple-terraform"
    }
  }
}
igorbrites commented 4 months ago

Ok so I got this working as you said, but the credentials are not working. The provider block has an assume_role in it, so my question now is, on the assume role policy, should I add the role or the OIDC provider?

alfespa17 commented 4 months ago

Basically you only need to add this environment variables in your code and it should work

image

There is one example terraform code here

https://github.com/AzBuilder/terrakube/tree/main/dynamic-credential-setup/aws

igorbrites commented 4 months ago

I have that assume_role set up on every backend I have nowadays, and my idea is to give terrakube role permission to assume all those roles and their specific AWS account. One Role to rule them all!

The config looks like this:

Untitled

Assume role policy on that role:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::<OPS-ACCOUNT>:oidc-provider/terrakube.<REDACTED>"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "terrakube.<REDACTED>:aud": "aws.workload.identity"
      },
      "StringLike": {
        "terrakube.<REDACTED>:sub": "organization:*:workspace:*"
      }
    }
  }]
}

Actual role policy:

{
  "Statement": [{
    "Action": ["sts:TagSession", "sts:AssumeRole"],
    "Effect": "Allow",
    "Resource": [
      "arn:aws:iam::<DIFFERENT-ACCOUNT>:role/another-role"
    ]
  }],
  "Version": "2012-10-17"
}

Provider:

provider "aws" {
  region = "us-east-1"

  assume_role {
    role_arn = "arn:aws:iam::<DIFFERENT-ACCOUNT>:role/another-role"
  }
}

Provider's assume role policy (tried both ways):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::<OPS-ACCOUNT>:role/ops-terrakube"
        ]
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<OPS-ACCOUNT>:oidc-provider/terrakube.<REDACTED>"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "terrakube.<REDACTED>:aud": "aws.workload.identity"
        },
        "StringLike": {
          "terrakube.<REDACTED>:sub": "organization:*:workspace:*"
        }
      }
    }
  ]
}

And this is the error I see:

��� Error: Invalid provider configuration
��� 
��� Provider "registry.terraform.io/hashicorp/aws" requires explicit
��� configuration. Add a provider block to the root module and configure the
��� provider's required arguments as described in the provider documentation.
��� 
���
���
��� Error: No valid credential sources found
��� 
���   with provider["registry.terraform.io/hashicorp/aws"],
���   on <empty> line 0:
���   (source code not available)
��� 
��� Please see https://registry.terraform.io/providers/hashicorp/aws
��� for more information about providing credentials.
alfespa17 commented 4 months ago

Quick question did you add your private and public key to Terrakube?

You will need it in order to generate the aws credentials.

https://docs.terrakube.io/user-guide/workspaces/dynamic-provider-credentials#generate-public-and-private-key

igorbrites commented 4 months ago

Quick question did you add your private and public key to Terrakube?

Yes, I did. I created them using Terraform, then created a secret on Kubernetes with them, and mounted them inside the API.

Here is the TF Code:

resource "tls_private_key" "this" {
  algorithm = "RSA"
  rsa_bits  = 2048
}

data "tls_public_key" "this" {
  private_key_pem = tls_private_key.this.private_key_pem
}

resource "kubernetes_secret_v1" "certificates" {
  metadata {
    name      = "aws-credentials-certificate"
    namespace = "terrakube"
  }

  data = {
    "private_key" = tls_private_key.this.private_key_pem
    "public_key"  = data.tls_public_key.this.public_key_pem
  }
}

And here are the Helm values:

api:
  volumeMounts:
  - mountPath: /tmp/aws-credentials
    name: aws-credentials-certificate
    readOnly: true
  volumes:
  - name: aws-credentials-certificate
    secret:
      defaultMode: 420
      secretName: aws-credentials-certificate
  env:
  - name: DynamicCredentialPublicKeyPath
    value: /tmp/aws-credentials/public_key
  - name: DynamicCredentialPrivateKeyPath
    value: /tmp/aws-credentials/private_key
alfespa17 commented 4 months ago

Someone had an issue generating the private key with terraform, check this:

https://github.com/AzBuilder/terrakube/issues/839#issuecomment-2104353653

alfespa17 commented 4 months ago

By the way I think you wont be able to use 1 role for every workspace.

There is one restriction that you need to consider when AWS check the JWT token generated by Terrakube it will validate the "audience" and the "subject" inside the token from this part of the code:

https://github.com/AzBuilder/terrakube/blob/8ca88dd3ca53c95c0a28294e9bdcf66b34703782/dynamic-credential-setup/aws/main.tf#L31

The subject include the name of the workspace and the name of the organization, so it will be different for every workspace.

There is a comment related to that when you are using TFC here

igorbrites commented 4 months ago

Someone had an issue generating the private key with terraform

Did this change and there's a new error message at least:

��� Error: failed to refresh cached credentials, failed to retrieve
��� credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded
��� maximum number of attempts, 3, https response error StatusCode: 400,
��� RequestID: <REDACTED>, InvalidIdentityToken: No
��� OpenIDConnect provider found in your account for
��� https://terrakube.<REDACTED>

Reading that other doc, I think the issue is that I'll need to have the OIDC Provider on each environment. My idea was to have a single one and let it assume roles across the other envs, but if I understood it correctly, even if I specify the ARN on those env vars, it still tries to find it on the current account.

I'll try to deploy the OIDC on the same account and give it another try, let's see.

Thanks for helping me out! I'll keep you posted.

igorbrites commented 4 months ago

It didn't work:

Identity provider was not added.
Could not connect to https://terrakube.<REDACTED>

The address is behind a private load balancer, so the other account can't see it.

I'll try another way to create this OIDC.

EDIT:

I created the OIDC using terraform on the ANOTHER-ACCOUNT, but now the error is what I was afraid of:

��� Error: failed to refresh cached credentials, failed to retrieve
��� credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded
��� maximum number of attempts, 3, https response error StatusCode: 400,
��� RequestID: <REDACTED>, InvalidIdentityToken:
��� Couldn't retrieve verification key from your identity provider,  please
��� reference AssumeRoleWithWebIdentity documentation for requirements

As Terrakube API is on another account behind a private load balancer, AWS can't connect to it to get those .well-known routes.

alfespa17 commented 4 months ago

The well-known endpoint should be public that is one restriction to use dynamic credentials

igorbrites commented 4 months ago

What are the other alternatives to authenticate to AWS from the executors?

alfespa17 commented 4 months ago

What are the other alternatives to authenticate to AWS from the executors?

Your only option will be adding the environment variables in the workspace setting for the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION.

igorbrites commented 4 months ago

Ok, I got it, I'll try that then, thanks for all the help!

alfespa17 commented 4 months ago

Other option could be to just expose these two endpoint publicly in other place

https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/openid-configuration https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/jwks

And you could do some customization for the Terrakube image here to generate the JWT using other domain that is public.

https://github.com/AzBuilder/terrakube/blob/8ca88dd3ca53c95c0a28294e9bdcf66b34703782/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L29

https://github.com/AzBuilder/terrakube/blob/8ca88dd3ca53c95c0a28294e9bdcf66b34703782/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L75

This will require more work but it should work for dynamic credentials.

In the end AWS just need to validate the JWT token generated by Terrakube that is used inside terraform/tofu with some information from the public key that is exposed using the above endpoints

igorbrites commented 4 months ago

I went the access/secret path, with a user that can only assume the roles I need. However now when I run plans, it's trying to destroy everything. I'll dig into the issues if someone already had this problem before.

igorbrites commented 4 months ago

Yeah, no luck so far. I've followed this post, though it still tries to destroy everything. Any tips @alfespa17?

alfespa17 commented 4 months ago

Yeah, no luck so far. I've followed this post, though it still tries to destroy everything. Any tips @alfespa17?

Not sure if I understand correctly you are trying to migrate something from TFC as I did here rigth?

igorbrites commented 4 months ago

Not sure if I understand correctly you are trying to migrate something from TFC as I did here rigth?

Kinda. Right now the state is not on TFC, but on S3 backend. So, resuming:

Remembering that I'm trying to run this locally without committing the files, on a workspace with VCS plugged in.

alfespa17 commented 4 months ago

Go to the workspace settings in the UI I guess that when you are doing the plan terrafom is trying to execute the code in a different directory so it is thinking that you delete all the resources.

igorbrites commented 4 months ago

Yes I have a specific folder set up there

image

Though I'm running the plan on the same folder locally.

alfespa17 commented 4 months ago

The path should be /clusters/dev/cluster

igorbrites commented 4 months ago

I just did the change

image

But even so, I see Plan: 0 to add, 0 to change, 83 to destroy..

alfespa17 commented 4 months ago

Can you check the executor logs? maybe you can find some information there about the directory that is using to run.

igorbrites commented 4 months ago
These are the logs ``` [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.executor.ExecutorJobImpl - Create Job for Organization Workspace [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - User Home Directory: /home/cnb [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - Workspace git clone directory: /home/cnb/.terraform-spring-boot/executor// [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - vcsType: PUBLIC [threadPoolTaskExecutor-1] ERROR org.terrakube.executor.service.workspace.SetupWorkspaceImpl - https://github.com/: Authentication is required but no CredentialsProvider has been registered [threadPoolTaskExecutor-1] ERROR org.terrakube.executor.service.workspace.security.WorkspaceSecurityImpl - Generate Dex Authentication Private Token [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - Executor WorkingDir: /home/cnb/.terraform-spring-boot/executor// [threadPoolTaskExecutor-1] ERROR org.terrakube.executor.service.executor.ExecutorJobImpl - /home/cnb/.terraform-spring-boot/executor///commitHash.info (No such file or directory) [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.executor.ExecutorJobImpl - Execute Plan for Organization Workspace [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.terraform.TerraformExecutorServiceImpl - Terraform Working Directory: /home/cnb/.terraform-spring-boot/executor///clusters/dev/cluster [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.logs.LogsConsumer - *************************************** [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.logs.LogsConsumer - Initializing Terrakube Job 63 Step 5f0403e3-d47d-4a96-8410-defdc9db27ef [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.logs.LogsConsumer - Running Terraform 1.7.4 [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.logs.LogsConsumer - *************************************** [threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.logs.LogsConsumer - Running Terraform Init: [threadPoolTaskExecutor-1] INFO org.terrakube.executor.plugin.tfstate.aws.AwsTerraformStateImpl - Generating backend override file for terraform 1.7.4 [threadPoolTaskExecutor-1] WARN org.terrakube.executor.service.terraform.TerraformExecutorServiceImpl - Not using any SSH key to download modules [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformClient - Creating terraform downloader using terraform release URL: https://releases.hashicorp.com/terraform/index.json and tofu release URL: https://api.github.com/repos/opentofu/opentofu/releases [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Initialize TerraformDownloader using custom URL [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - User Home Directory: /home/cnb [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Validate/Create download temp directory: /home/cnb/.terraform-spring-boot/download/ [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Validate/Create terraform directory: /home/cnb/.terraform-spring-boot/terraform/ [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - User Home Directory for tofu download: /home/cnb [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Validate/Create tofu download temp directory: /home/cnb/.terraform-spring-boot/download/tofu/ [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Validate/Create tofu directory: /home/cnb/.terraform-spring-boot/tofu/ [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Downloading terraform releases list [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Deleting Temp /home/cnb/.terraform-spring-boot/terraform-7567742292494670599-release [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Found 328 terraform releases [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Downloading tofu releases list [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Deleting Temp /home/cnb/.terraform-spring-boot/tofu-11835378242115630697-release [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Found 21 tofu releases [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - Downloading terraform version 1.7.4 architecture amd64 Type Linux [threadPoolTaskExecutor-1] INFO org.terrakube.terraform.TerraformDownloader - terraform_1.7.4_linux_amd64.zip terraform already exists [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - Initializing the backend... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - Successfully configured the backend "s3"! Terraform will automatically [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - use this backend unless the backend configuration changes. [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - Initializing provider plugins... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Finding latest version of hashicorp/tls... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Finding latest version of cloudposse/utils... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Finding latest version of hashicorp/aws... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Finding latest version of hashicorp/kubernetes... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Finding latest version of hashicorp/time... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installing hashicorp/kubernetes v2.31.0... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installed hashicorp/kubernetes v2.31.0 (signed by HashiCorp) [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installing hashicorp/time v0.11.2... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installed hashicorp/time v0.11.2 (signed by HashiCorp) [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installing hashicorp/tls v4.0.5... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installed hashicorp/tls v4.0.5 (signed by HashiCorp) [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installing cloudposse/utils v1.23.0... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installed cloudposse/utils v1.23.0 (self-signed, key ID 7B22D099488F3D11) [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installing hashicorp/aws v5.56.1... [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - - Installed hashicorp/aws v5.56.1 (signed by HashiCorp) [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - Partner and community providers are signed by their developers. [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - If you'd like to know more about provider signing, you can read about it here: [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - https://www.terraform.io/docs/cli/plugins/signing.html [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - Terraform has created a lock file .terraform.lock.hcl to record the provider [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - selections it made above. Include this file in your version control repository [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - so that Terraform can guarantee to make the same selections by default when [ForkJoinPool-1-worker-7] INFO org.terrakube.executor.service.logs.LogsConsumer - you run "terraform init" in the future. ```

Two lines caught my attention:

https://github.com/<REDACTED>: Authentication is required but no CredentialsProvider has been registered
...
Successfully configured the backend "s3"! Terraform will automatically...

So even if locally I have this, it's still getting the backend configuration from main branch:

terraform {
  required_version = ">= 1.7.4"

  # backend "s3" {
  #   bucket         = "<REDACTED>"
  #   key            = "<REDACTED>/terraform.tfstate"
  #   region         = "us-east-1"
  #   dynamodb_table = "<REDACTED>"
  # }

  cloud {
    hostname = "terrakube.<REDACTED>"
    organization = "<REDACTED>"
    workspaces {
      name = "eks-clusters-dev-cluster"
    }
  }
}
igorbrites commented 4 months ago

I'll reopen the issue as the local plan is still not working.

alfespa17 commented 4 months ago

Hello @igorbrites

I think you are having an authentication issue, chekc this 2 lines:

[threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - vcsType: PUBLIC
[threadPoolTaskExecutor-1] ERROR org.terrakube.executor.service.workspace.SetupWorkspaceImpl - https://github.com/<REDACTED>: Authentication is required but no CredentialsProvider has been registered

Maybe when you creted the workspace you didn't select which vcs provider to use when connecting to the repository.

Reference code: https://github.com/AzBuilder/terrakube/blob/fe6b29ee5ed1f7b1aefb1e85ecb1cab8d1f75fcb/executor/src/main/java/org/terrakube/executor/service/workspace/SetupWorkspaceImpl.java#L326

igorbrites commented 4 months ago

Ok, I'll try to recreate the workspace, but IIRC I created the workspace setting up the VCS connection. Let's see.

alfespa17 commented 4 months ago

Ok, I'll try to recreate the workspace, but IIRC I created the workspace setting up the VCS connection. Let's see.

Try running a plan from the UI once you create the workspace again

igorbrites commented 4 months ago

It worked! Though I haven't seen your message earlier, so I did some tests of my own:

So the bottom line is that you need to add the starting / when creating the workspace, and updating it won't have any effect.

igorbrites commented 4 months ago

Quick question, how do I add -refresh=false to a Plan template? Kubernetes provider has an old issue that doesn't connect to the cluster we set on the provider, and a way to bypass it is to add -refresh=false to the plan. Actually that was my very first issue opened here 😄

EDIT:

Should I create a customScripts running the terraform plan -refresh=false?

alfespa17 commented 4 months ago

There is no way to add that parameter to a template for now.

You can add -refresh=false when you are using the CLI driven workflow in local or remote mode

I think the only alternative for now could be to create "Job" using the API and send this two parameters. ( I never tested that)

https://github.com/AzBuilder/terrakube/blob/fe6b29ee5ed1f7b1aefb1e85ecb1cab8d1f75fcb/api/src/main/java/org/terrakube/api/rs/job/Job.java#L77 https://github.com/AzBuilder/terrakube/blob/fe6b29ee5ed1f7b1aefb1e85ecb1cab8d1f75fcb/api/src/main/java/org/terrakube/api/rs/job/Job.java#L83

By the way customScript won't work becuase Terrakube won't be able to save all the information that it is need it internally here

https://github.com/AzBuilder/terrakube/blob/fe6b29ee5ed1f7b1aefb1e85ecb1cab8d1f75fcb/executor/src/main/java/org/terrakube/executor/service/terraform/TerraformExecutorServiceImpl.java#L116

igorbrites commented 4 months ago

Cool, I'll test it out later on, for now, I'll resume my Terrakube evaluation, and as the plan is working I'll close out this issue (again 😬). Thanks again for all the help and troubleshooting!