Sceptre / sceptre

Build better AWS infrastructure
https://docs.sceptre-project.org
Other
1.49k stars 313 forks source link

AWS STS Session hard limit duration of 1 hour for chained roles #1466

Open owitplat opened 6 months ago

owitplat commented 6 months ago

Subject of the issue

Cloudformation updates can take longer than 1 hour to complete for stacks waiting for a CF success signal. For example on a Windows build that installs various applications at launch time the success signal can take > 1 hour to come back.

When scepter 4 is run from an ECS container that assumes a role into another account to trigger a stack update that takes longer than 1 hour, the sceptre run will fail with: "An error occurred (ExpiredToken) when calling the DescribeStacks operation: The security token included in the request is expired"

Increasing the session duration is unhelpful as AWS have a hard limit of 1 hour as per "Role Chaning" under https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html

And sceptre fails with An error occurred (ValidationError) when calling the AssumeRole operation: The requested DurationSeconds exceeds the 1 hour session limit for roles assumed by role chaining.

The AWS provided solution is to launch sceptre using an IAM USER credential which doesn't suffer from the 1 hour hard limit on assumed roles. This is possible, however it means having to manage and rotate the long lived IAM USER credential which is undesirable.

Sceptre v1 supported refreshing the temporary credentials via this commit: https://github.com/Sceptre/sceptre/commit/de616e46ee075a64febf32c6f92b47e2ae8ef3c9

It appears that this feature was not carried over into sceptre v2-v4.

We are currently migrating from sceptre 1 to 4 which has surfaced this issue.

Can the sceptre 1 feature be pulled into sceptre 4?

Your environment

Steps to reproduce

Create a sceptre 4 managed stack where the update takes > 1 hour (by way of a delayed CF success signal) using temporary credentials to launch sceptre and set sceptre_role to another role to actually launch the stack.

Expected behaviour

sceptre should successfully create the stack.

Actual behaviour

sceptre fails after 1 hour with "An error occurred (ExpiredToken) when calling the DescribeStacks operation: The security token included in the request is expired"

Sceptre v1 would remove an expired session and create a new one as per https://github.com/Sceptre/sceptre/commit/de616e46ee075a64febf32c6f92b47e2ae8ef3c9

zaro0508 commented 6 months ago

Can the sceptre 1 feature be pulled into sceptre 4?

If someone cares to create a PR for this it will definitely be considered for inclusion.