Closed igorbrites closed 4 months ago
I think this was fixed here:
https://github.com/AzBuilder/terrakube/issues/661
In VCS workspace you just need to add locally your backend.tf like the following and it will run a speculative plan
terraform {
cloud {
organization = "simple"
hostname = "8080-azbuilder-terrakube-srvv8ms68ej.ws-us107.gitpod.io"
workspaces {
name = "simple-terraform"
}
}
}
Ok so I got this working as you said, but the credentials are not working. The provider block has an assume_role
in it, so my question now is, on the assume role policy, should I add the role or the OIDC provider?
Basically you only need to add this environment variables in your code and it should work
There is one example terraform code here
https://github.com/AzBuilder/terrakube/tree/main/dynamic-credential-setup/aws
I have that assume_role
set up on every backend I have nowadays, and my idea is to give terrakube role permission to assume all those roles and their specific AWS account. One Role to rule them all!
The config looks like this:
Assume role policy on that role:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<OPS-ACCOUNT>:oidc-provider/terrakube.<REDACTED>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"terrakube.<REDACTED>:aud": "aws.workload.identity"
},
"StringLike": {
"terrakube.<REDACTED>:sub": "organization:*:workspace:*"
}
}
}]
}
Actual role policy:
{
"Statement": [{
"Action": ["sts:TagSession", "sts:AssumeRole"],
"Effect": "Allow",
"Resource": [
"arn:aws:iam::<DIFFERENT-ACCOUNT>:role/another-role"
]
}],
"Version": "2012-10-17"
}
Provider:
provider "aws" {
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::<DIFFERENT-ACCOUNT>:role/another-role"
}
}
Provider's assume role policy (tried both ways):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::<OPS-ACCOUNT>:role/ops-terrakube"
]
},
"Action": ["sts:AssumeRole", "sts:TagSession"]
},
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<OPS-ACCOUNT>:oidc-provider/terrakube.<REDACTED>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"terrakube.<REDACTED>:aud": "aws.workload.identity"
},
"StringLike": {
"terrakube.<REDACTED>:sub": "organization:*:workspace:*"
}
}
}
]
}
And this is the error I see:
��� Error: Invalid provider configuration
���
��� Provider "registry.terraform.io/hashicorp/aws" requires explicit
��� configuration. Add a provider block to the root module and configure the
��� provider's required arguments as described in the provider documentation.
���
���
���
��� Error: No valid credential sources found
���
��� with provider["registry.terraform.io/hashicorp/aws"],
��� on <empty> line 0:
��� (source code not available)
���
��� Please see https://registry.terraform.io/providers/hashicorp/aws
��� for more information about providing credentials.
Quick question did you add your private and public key to Terrakube?
You will need it in order to generate the aws credentials.
Quick question did you add your private and public key to Terrakube?
Yes, I did. I created them using Terraform, then created a secret on Kubernetes with them, and mounted them inside the API.
Here is the TF Code:
resource "tls_private_key" "this" {
algorithm = "RSA"
rsa_bits = 2048
}
data "tls_public_key" "this" {
private_key_pem = tls_private_key.this.private_key_pem
}
resource "kubernetes_secret_v1" "certificates" {
metadata {
name = "aws-credentials-certificate"
namespace = "terrakube"
}
data = {
"private_key" = tls_private_key.this.private_key_pem
"public_key" = data.tls_public_key.this.public_key_pem
}
}
And here are the Helm values:
api:
volumeMounts:
- mountPath: /tmp/aws-credentials
name: aws-credentials-certificate
readOnly: true
volumes:
- name: aws-credentials-certificate
secret:
defaultMode: 420
secretName: aws-credentials-certificate
env:
- name: DynamicCredentialPublicKeyPath
value: /tmp/aws-credentials/public_key
- name: DynamicCredentialPrivateKeyPath
value: /tmp/aws-credentials/private_key
Someone had an issue generating the private key with terraform, check this:
https://github.com/AzBuilder/terrakube/issues/839#issuecomment-2104353653
By the way I think you wont be able to use 1 role for every workspace.
There is one restriction that you need to consider when AWS check the JWT token generated by Terrakube it will validate the "audience" and the "subject" inside the token from this part of the code:
The subject include the name of the workspace and the name of the organization, so it will be different for every workspace.
There is a comment related to that when you are using TFC here
Someone had an issue generating the private key with terraform
Did this change and there's a new error message at least:
��� Error: failed to refresh cached credentials, failed to retrieve
��� credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded
��� maximum number of attempts, 3, https response error StatusCode: 400,
��� RequestID: <REDACTED>, InvalidIdentityToken: No
��� OpenIDConnect provider found in your account for
��� https://terrakube.<REDACTED>
Reading that other doc, I think the issue is that I'll need to have the OIDC Provider on each environment. My idea was to have a single one and let it assume roles across the other envs, but if I understood it correctly, even if I specify the ARN on those env vars, it still tries to find it on the current account.
I'll try to deploy the OIDC on the same account and give it another try, let's see.
Thanks for helping me out! I'll keep you posted.
It didn't work:
Identity provider was not added.
Could not connect to https://terrakube.<REDACTED>
The address is behind a private load balancer, so the other account can't see it.
I'll try another way to create this OIDC.
EDIT:
I created the OIDC using terraform on the ANOTHER-ACCOUNT, but now the error is what I was afraid of:
��� Error: failed to refresh cached credentials, failed to retrieve
��� credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded
��� maximum number of attempts, 3, https response error StatusCode: 400,
��� RequestID: <REDACTED>, InvalidIdentityToken:
��� Couldn't retrieve verification key from your identity provider, please
��� reference AssumeRoleWithWebIdentity documentation for requirements
As Terrakube API is on another account behind a private load balancer, AWS can't connect to it to get those .well-known
routes.
The well-known endpoint should be public that is one restriction to use dynamic credentials
What are the other alternatives to authenticate to AWS from the executors?
What are the other alternatives to authenticate to AWS from the executors?
Your only option will be adding the environment variables in the workspace setting for the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION.
Ok, I got it, I'll try that then, thanks for all the help!
Other option could be to just expose these two endpoint publicly in other place
https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/openid-configuration https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/jwks
And you could do some customization for the Terrakube image here to generate the JWT using other domain that is public.
This will require more work but it should work for dynamic credentials.
In the end AWS just need to validate the JWT token generated by Terrakube that is used inside terraform/tofu with some information from the public key that is exposed using the above endpoints
I went the access/secret path, with a user that can only assume the roles I need. However now when I run plans, it's trying to destroy everything. I'll dig into the issues if someone already had this problem before.
Yeah, no luck so far. I've followed this post, though it still tries to destroy everything. Any tips @alfespa17?
Not sure if I understand correctly you are trying to migrate something from TFC as I did here rigth?
Kinda. Right now the state is not on TFC, but on S3 backend. So, resuming:
I have a state file on S3;
I pulled the state file locally;
Removed .terraform
;
Removed the S3 backend reference from the code locally, and added a cloud
one:
cloud {
hostname = "terrakube.<REDACTED>"
organization = "<REDACTED>"
workspaces {
name = "eks-clusters-dev-cluster"
}
}
Run terraform init
;
Terraform then asked me if I wanted to migrate the state, answered yes
;
For some reason, if I click on States inside the workspace on Terrakube, I only see ""
;
So I pulled the S3 file again and terraform state push terraform.tfstate
;
Now I see the actual state on Terrakube;
Try a terraform plan
, it shows Plan: 0 to add, 0 to change, 83 to destroy.
.
Remembering that I'm trying to run this locally without committing the files, on a workspace with VCS plugged in.
Go to the workspace settings in the UI I guess that when you are doing the plan terrafom is trying to execute the code in a different directory so it is thinking that you delete all the resources.
Yes I have a specific folder set up there
Though I'm running the plan on the same folder locally.
The path should be /clusters/dev/cluster
I just did the change
But even so, I see Plan: 0 to add, 0 to change, 83 to destroy.
.
Can you check the executor logs? maybe you can find some information there about the directory that is using to run.
Two lines caught my attention:
https://github.com/<REDACTED>: Authentication is required but no CredentialsProvider has been registered
...
Successfully configured the backend "s3"! Terraform will automatically...
So even if locally I have this, it's still getting the backend configuration from main branch:
terraform {
required_version = ">= 1.7.4"
# backend "s3" {
# bucket = "<REDACTED>"
# key = "<REDACTED>/terraform.tfstate"
# region = "us-east-1"
# dynamodb_table = "<REDACTED>"
# }
cloud {
hostname = "terrakube.<REDACTED>"
organization = "<REDACTED>"
workspaces {
name = "eks-clusters-dev-cluster"
}
}
}
I'll reopen the issue as the local plan is still not working.
Hello @igorbrites
I think you are having an authentication issue, chekc this 2 lines:
[threadPoolTaskExecutor-1] INFO org.terrakube.executor.service.workspace.SetupWorkspaceImpl - vcsType: PUBLIC
[threadPoolTaskExecutor-1] ERROR org.terrakube.executor.service.workspace.SetupWorkspaceImpl - https://github.com/<REDACTED>: Authentication is required but no CredentialsProvider has been registered
Maybe when you creted the workspace you didn't select which vcs provider to use when connecting to the repository.
Ok, I'll try to recreate the workspace, but IIRC I created the workspace setting up the VCS connection. Let's see.
Ok, I'll try to recreate the workspace, but IIRC I created the workspace setting up the VCS connection. Let's see.
Try running a plan from the UI once you create the workspace again
It worked! Though I haven't seen your message earlier, so I did some tests of my own:
/
at the beginning, and VCS connection, and it worked./
this time, and it didn't work, even if I added /
afterward.So the bottom line is that you need to add the starting /
when creating the workspace, and updating it won't have any effect.
Quick question, how do I add -refresh=false
to a Plan template? Kubernetes provider has an old issue that doesn't connect to the cluster we set on the provider, and a way to bypass it is to add -refresh=false
to the plan. Actually that was my very first issue opened here 😄
EDIT:
Should I create a customScripts
running the terraform plan -refresh=false
?
There is no way to add that parameter to a template for now.
You can add -refresh=false
when you are using the CLI driven workflow in local
or remote
mode
I think the only alternative for now could be to create "Job" using the API and send this two parameters. ( I never tested that)
https://github.com/AzBuilder/terrakube/blob/fe6b29ee5ed1f7b1aefb1e85ecb1cab8d1f75fcb/api/src/main/java/org/terrakube/api/rs/job/Job.java#L77 https://github.com/AzBuilder/terrakube/blob/fe6b29ee5ed1f7b1aefb1e85ecb1cab8d1f75fcb/api/src/main/java/org/terrakube/api/rs/job/Job.java#L83
By the way customScript won't work becuase Terrakube won't be able to save all the information that it is need it internally here
Cool, I'll test it out later on, for now, I'll resume my Terrakube evaluation, and as the plan is working I'll close out this issue (again 😬). Thanks again for all the help and troubleshooting!
Bug description 🐞
I've read the issue https://github.com/AzBuilder/terrakube/issues/596, though I think my problem is different. When using Terraform Cloud, if you run plans using local (uncommitted) code, this code is somehow uploaded to TFC runners and planned as needed even if the workspace has a VCS connection configured.
Right now I'm evaluating the change to Terrakube (workspaces and modules), and I need to test the connection using Terrakube's OIDC provider created using this code, but if I run a plan locally with the backend changes, it still tries to run the code from my main branch.
I can't set up multiple Terraform tokens on GitHub Actions right now, and I haven't move the modules there yet, so I need to test the AWS authentication on the executors, but with my local code. How can we achieve it?
Steps to reproduce
Expected behavior
Code gets pushed to the executors and the plan occurs as normal.
Example repository
No response
Anything else?
No response