AzBuilder / terrakube

Open source IaC Automation and Collaboration Software.
https://docs.terrakube.io
Apache License 2.0
522 stars 44 forks source link

Dynamic Credentials (AWS) Not working on terrakube 2.22 #1425

Closed isbig5 closed 3 weeks ago

isbig5 commented 3 weeks ago

Feedback

Hey guys, i am facing an issue while setting up my terrakube environment.

I was trying to set up Dynamic credentials, did everything told on Dynamic credentials Documentation. I created the Public/private keys, on docker compose copied it to the api container, and then assigned its two new variables with the keys path inside the api container:

image,

When the api containers were deployed, i logged into its shell, and checked the folders, and they were there.

Then i went to "terrakube-api.mydomain.com/.well-known/jwks and the other path: terrakube-api.mydomain.com/.well-known/openid-configuration", and both gave the expected responses, as told on the docs.

I ran the Dynamic credentials terraform, creating the policy, role and oidc config on my aws account. The only thing i changed was the name of the workspace i was working with.

Here is my trust relationship: (replaced real info for privacy reasons)

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::1234567890:oidc-provider/terrakube-api.mydomain.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "terrakube-api.mydomain.com:sub": "organization:company-develop:workspace:iac-terrakube-teste",
                    "terrakube-api.mydomain.com:aud": "aws.workload.identity"
                }
            }
        }
    ]
}

My original intention was to create the AWS envs on the global vars, but docs were doing it on workspace, so i did the same.

I ran a new plan to create a S3 bucket, and got this error. what am i doing wrong here?

Planning failed. Terraform encountered an error while generating this plan. ��� ��� Error: No valid credential sources found ��� ��� with provider["registry.terraform.io/hashicorp/aws"], ��� on main.tf line 1, in provider "aws": ��� 1: provider "aws" { ��� ��� Please see https://registry.terraform.io/providers/hashicorp/aws ��� for more information about providing credentials. ��� ��� Error: failed to refresh cached credentials, failed to retrieve ��� credentials, operation error STS: AssumeRoleWithWebIdentity, https response ��� error StatusCode: 400, RequestID: (TheRequestIDgoesHere, I removed it), api ��� error ValidationError: 1 validation error detected: Value at ��� 'webIdentityToken' failed to satisfy constraint: Member must have length ��� greater than or equal to 4

I'm trying to solve this problem for days, and i still have no clue about how to solve it.

OBS: My Terrakube is the docker compose version 2.22. ( I'm using terrakube on docker in an EC2 instance because my intention is to use terrakube to manage my EKS clusters.) My EC2 instance with terrakube on it has a public IP, but its inbound access is filtere, allowing only bitbucket webhooks and a few other internal services.

alfespa17 commented 3 weeks ago

Did you check if your PEM file is in "pkcs8" format???

isbig5 commented 3 weeks ago

Did you check if your PEM file is in "pkcs8" format???

Yes, i did, i took a first look when it was generated and it was fine

After i got the error, i tried to use the command on the docs to convert the key to the pkcs8 format

openssl pkcs8 -topk8 -inform PEM -outform PEM -nocrypt -in private_temp.pem -out private.pem,

but looks like it still wasnt the root cause of the error

alfespa17 commented 3 weeks ago

You can enable debugging for dynamic credentials in the API container using the following env variable, so you can see the internal values inside the logs

JAVA_TOOL_OPTIONS="-Dlogging.level.org.terrakube.api.plugin.token.dynamic=DEBUG -Dlogging.level.org.terrakube.api.plugin.scheduler.job.tcl.executor=DEBUG"
isbig5 commented 3 weeks ago

allright, im going to put these debug args and try again, i'll update here on results.

Thanks

alfespa17 commented 3 weeks ago

Using that you should be able to see the values from this part of the code

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L101

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L103

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L104

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L105

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/scheduler/job/tcl/executor/ExecutorService.java#L194

That can help you to validate if everything is working correctly

isbig5 commented 3 weeks ago

Using that you should be able to see the values from this part of the code

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L101

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L103

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L104

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L105

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/scheduler/job/tcl/executor/ExecutorService.java#L194

That can help you to validate if everything is working correctly

I got the logs with the debug output. I noticed that the accessToken item is with **** value. Is it normal?

2024-10-17T16:56:01.626Z DEBUG 1 --- [ryBean_Worker-4] o.t.a.p.s.j.t.executor.ExecutorService   : Sending Job: /n ExecutorContext(commandList=null, type=terraformPlan, organizationId=281f4a85-5974-43da-bc1c-4890feeabbd0, workspaceId=fb5d00fb-e9c9-4cda-abdf-ad46ed754cb8, jobId=10, stepId=3646c137-ebaf-440b-87dc-b1f73b95a88e, terraformVersion=1.9.7, source=https://bitbucket.org/company/iac-terrakube-teste, branch=develop, folder=/, vcsType=BITBUCKET, refresh=true, refreshOnly=false, showHeader=true, accessToken=****, moduleSshKey=****, commitId=null, tofu=false, agentUrl=https://terrakube-executor.mydomain.com/api/v1/terraform-rs, environmentVariables={ENABLE_DYNAMIC_CREDENTIALS_AWS=true, AWS_REGION=us-east-1, organizationName=company-develop, WORKLOAD_IDENTITY_ROLE_AWS=arn:aws:iam::1234567890:role/terrakube-roleeft, TF_IN_AUTOMATION=1, AWS_WEB_IDENTITY_TOKEN_FILE=/home/cnb/.terraform-spring-boot/executor/281f4a85-5974-43da-bc1c-4890feeabbd0/fb5d00fb-e9c9-4cda-abdf-ad46ed754cb8/terrakube_config_dynamic_credentials_aws.txt, AWS_ROLE_ARN=arn:aws:iam::1234567890:role/terrakube-roleeft, workspaceName=iac-terrakube-teste, WORKLOAD_IDENTITY_AUDIENCE_AWS=aws.workload.identity, TERRAKUBE_AWS_CREDENTIALS_FILE=}, variables={})`

AWS_WEB_IDENTITY_TOKEN_FILE returns a path, that i tried to follow when opening the api and after the executor shell. Found the path but not the file inside it TERRAKUBE_AWS_CREDENTIALS_FILE returns empty.

Looks like the credential file is returning empty... Maybe i messed up on some part of the process?

alfespa17 commented 3 weeks ago

Check for this specific line log.debug("TERRAKUBE_AWS_CREDENTIALS_FILE: {}", awsWebIdentityToken);, it will show the token that will be used inside the logs you should see a base64 enconded json web token

public HashMap<String, String> generateDynamicCredentialsAws(Job job, HashMap<String, String> workspaceEnvVariables) {
        String awsWebIdentityToken = generateJwt(
                job.getOrganization().getName(),
                job.getWorkspace().getName(),
                workspaceEnvVariables.get("WORKLOAD_IDENTITY_AUDIENCE_AWS"),
                job.getOrganization().getId().toString(),
                job.getWorkspace().getId().toString(),
                job.getId()
        );

        log.debug("TERRAKUBE_AWS_CREDENTIALS_FILE: {}", awsWebIdentityToken);

        workspaceEnvVariables.put("TERRAKUBE_AWS_CREDENTIALS_FILE", awsWebIdentityToken);
        workspaceEnvVariables.put("AWS_ROLE_ARN", workspaceEnvVariables.get("WORKLOAD_IDENTITY_ROLE_AWS"));
        workspaceEnvVariables.put("AWS_WEB_IDENTITY_TOKEN_FILE", getDefaultExecutorPath(job) + "/terrakube_config_dynamic_credentials_aws.txt");

        return workspaceEnvVariables;
    }

If you dont see the token I guess it is not generated correctly here:

https://github.com/AzBuilder/terrakube/blob/b46286656a1fbd0fb4e7176e723c15605ebbb6bd/api/src/main/java/org/terrakube/api/plugin/token/dynamic/DynamicCredentialsService.java#L61

alfespa17 commented 3 weeks ago

If you see from your logs I think the value is empty

image

alfespa17 commented 3 weeks ago

If you see from your logs I think the value is empty

image

I guess in your logs you should see some error message

isbig5 commented 3 weeks ago

image i found an error message on the private key thing, and the credential file variable empty

isbig5 commented 3 weeks ago

looks like theres still something wrong with the key, idk if its permissions (i pasted the .pem on the /etc/ folder), the key itself look normal

isbig5 commented 3 weeks ago

i found some other errors, but nothing related to dynamic credentials, looks like something related to not find something the s3 state bucket (it is being authenticated with its own secret key. + id, and is already writing to bucket.)

alfespa17 commented 3 weeks ago

i found some other errors, but nothing related to dynamic credentials, looks like something related to not find something the s3 state bucket (it is being authenticated with its own secret key. + id), and is already writing to bucket.

You dont need to worry for those

isbig5 commented 3 weeks ago

i'm going to move the .pem and put it in another place, to see if i get a different output

alfespa17 commented 3 weeks ago

Maybe it is a file permission issue, this app is running using the following user.

uid=1002(cnb) gid=1000(cnb) groups=1000(cnb)

isbig5 commented 3 weeks ago

Yeah, true ... may be it. im going to try applying a chmod + chown here

isbig5 commented 3 weeks ago

and yes, looks like permissions were exactly the problem. idk why i thought putting the keys on /etc/ would be a good idea haha they're now on /workspace folder with the permissions you told me above.

The debug output now gives me a value for TERRAKUBE_AWS_CREDENTIALS_FILE.

When i run a plan, it still gives me an error, but it may be something from my account blocking something for the amount of requests. I'll try recreating the oidc config and roles to see if it solves the problem.

��� Error: No valid credential sources found
��� 
���   with provider["registry.terraform.io/hashicorp/aws"],
���   on main.tf line 1, in provider "aws":
���    1: provider "aws" {
��� 
��� Please see https://registry.terraform.io/providers/hashicorp/aws
��� for more information about providing credentials.
��� 
��� Error: failed to refresh cached credentials, failed to retrieve
��� credentials, operation error STS: AssumeRoleWithWebIdentity, exceeded
��� maximum number of attempts, 3, https response error StatusCode: 400,
��� RequestID: (RequestIDHere-I-Removed-It), InvalidIdentityToken:
��� Couldn't retrieve verification key from your identity provider,  please
��� reference AssumeRoleWithWebIdentity documentation for requirements
alfespa17 commented 3 weeks ago

You need to check this endpoint, AWS should be able to connect to those without any kind of restriction

https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/jwks https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/openid-configuration

isbig5 commented 3 weeks ago

You need to check this endpoint, AWS should be able to connect to those without any kind of restriction

https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/jwks https://TERRAKUBE.MYSUPERDOMAIN.COM/.well-known/openid-configuration

Allright, i'll see it , i cant let terrakube open for all the internet because of company policy, so im going to look if aws has some ip range list available to whitelist.

alfespa17 commented 3 weeks ago

@isbig5 If you allow only those 2 endpoints it should work

You can check that here

isbig5 commented 3 weeks ago

@isbig5 If you allow only those 2 endpoints it should work

You can check that here

Thanks! My tk is finally working. I ended up using Cloudfront + and some WAF rules to restrict access. limited the inbound to receive only traffic from cloudfront prefix list. Unfortunately looks like the IAM Identity provider doesnt have a specific range of IPs, and runs on EC2 range. And allowing the whole ec2 range directly on the instance would be too risky for me.

Another question: Is it possible to create the terrakube trust policy for all workspaces on the organization, and not only one? I would like to use this role on the global variables for all my organization.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::1234567890:oidc-provider/terrakube-api.mydomain.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "terrakube-api.mydomain.com:sub": "organization:company-develop:workspace:iac-terrakube-teste",
                    "terrakube-api.mydomain.com:aud": "aws.workload.identity"
                }
            }
        }
    ]
}

Can i remove this "workspace:iac-terrakube-teste" and allow all? If yes, how?

alfespa17 commented 3 weeks ago

Hello @isbig5 I guess it should work if you remove the workspace:iac-terrakube-test you should be able to use that for any workspace inside the organization.

I am not an AWS expert just to clarify XD I just implemented the dynamic credentials following this documentation

In the end AWS will check the subject of the JWT that is generated by terrakube in this part of the code and validate the signature using the enpoint /.well-known/jwks

log.debug("TERRAKUBE_AWS_CREDENTIALS_FILE: {}", awsWebIdentityToken);

You could decode the token to check the content of the JWT token to see all the values

The documentation mentioned something related to the trust policy here