aws-ia / terraform-aws-control_tower_account_factory

AWS Control Tower Account Factory
Apache License 2.0
631 stars 420 forks source link

Security token expires in the middle of stack set instance creation operation #350

Closed ruthchangadi closed 1 year ago

ruthchangadi commented 1 year ago

Terraform Version & Prov:

AFT Version: 1.6.5

Terraform Version & Provider Versions Please provide the outputs of terraform version and terraform providers from within your AFT environment

terraform version

Terraform v1.4.6
on linux_amd64

terraform providers

    provider[registry.terraform.io/hashicorp/archive]

    provider[registry.terraform.io/hashicorp/local]

    provider[registry.terraform.io/hashicorp/random]

    provider[registry.terraform.io/hashicorp/aws]

    provider[registry.terraform.io/hashicorp/time]

Bug Description Receiving error "Error: Failed to save state" and "Error saving state: failed to upload state: NoCredentialProviders: no valid providers in chain. Deprecated." in the middle of finishing a stack set instance creation operation.

To Reproduce Steps to reproduce the behavior:

  1. Add a stack set resource and stack set instance resource to be deployed as an account customization for a particular account. The stack set instance is to be deployed on all accounts w/in the organization, with the number of accounts being around 80 or so.
  2. Invoke customizations for the account and wait for the stack set instance creation operation to start. At some point, the operation exceeds the assumed role's maximum session duration of presumably an hour, and save state will fail. The stack set instance creation itself will continue going and probably succeed w/in CloudFormation itself on the account.

Expected behavior The build should finish and exit appropriately given a large enough maximum session duration for the assume role to last to finish the save state operation.

Related Logs

829 | │ Error: Failed to save state
830 | │
831 | │ Error saving state: failed to upload state: NoCredentialProviders: no valid
832 | │ providers in chain. Deprecated.
833 | │   For verbose messaging see aws.Config.CredentialsChainVerboseErrors
834 | ╵
835 | ╷
836 | │ Error: Failed to persist state to backend
837 | │
838 | │ The error shown above has prevented Terraform from writing the updated
839 | │ state to the configured backend. To allow for recovery, the state has been
840 | │ written to the file "errored.tfstate" in the current working directory.
841 | │
842 | │ Running "terraform apply" again at this point will create a forked state,
843 | │ making it harder to recover.
844 | │
845 | │ To retry writing this state, use the following command:
846 | │     terraform state push errored.tfstate
847 | │
848 | ╵
849 | ╷
850 | │ Error: creating CloudFormation StackSet (example) Instance: failed to refresh cached credentials, operation error STS: AssumeRole, https response error StatusCode: 403, RequestID: 35d791fd-1894-44ed-9d30-dc6333368638, api error ExpiredToken: The security token included in the request is expired
851 | │
852 | │   with aws_cloudformation_stack_set_instance.example,
853 | │   on wiz.tf line 34, in resource "aws_cloudformation_stack_set_instance" "example":
854 | │   34: resource "aws_cloudformation_stack_set_instance" "example" {
855 | │
856 | ╵
857 | Releasing state lock. This may take a few moments...
858 | ╷
859 | │ Error: Error releasing the state lock
860 | │
861 | │ Error message: failed to retrieve lock info: NoCredentialProviders: no
862 | │ valid providers in chain. Deprecated.
863 | │   For verbose messaging see aws.Config.CredentialsChainVerboseErrors
864 | │
865 | │ Terraform acquires a lock when accessing your state to prevent others
866 | │ running Terraform to potentially modify the state at the same time. An
867 | │ error occurred while releasing this lock. This could mean that the lock
868 | │ did or did not release properly. If the lock didn't release properly,
869 | │ Terraform may not be able to run future commands since it'll appear as if
870 | │ the lock is held.
871 | │
872 | │ In this scenario, please call the "force-unlock" command to unlock the
873 | │ state manually. This is a very dangerous operation since if it is done
874 | │ erroneously it could result in two people modifying state at the same time.
875 | │ Only call this command if you're certain that the unlock above failed and
876 | │ that no one else is holding a lock.

Additional context I'm not sure which role should have its maximum session duration increased; I've tried to increase it for both AWSAFTExecution and AWSAFTAdmin with no luck.

ruthchangadi commented 1 year ago

It has only just occurred to me that I can adjust the operation preferences for stack instance deployment, so the maximum session duration isn't an issue anymore. I'd still like to know where the session limit is for what role, though.

snebhu3 commented 1 year ago

@ruthchangadi during customization, the AWSAFTExecution role is assumed with default session duration (1 hour) into the target account from the AFT management account. Currently, there isn't a way for you to increase the session duration on this STS assume role call.

ruthchangadi commented 1 year ago

Thanks @snebhu3 for this. I'll close this.