aws-samples / 1click-hpc

Deploy your HPC Cluster on AWS in 20min. with just 1-Click.
MIT No Attribution
62 stars 44 forks source link

`pcluster create-cluster --wait` is exiting early #41

Open cmbrehm opened 1 year ago

cmbrehm commented 1 year ago

you can see this in bootstrap.log

+ /home/ec2-user/.local/bin/pcluster create-cluster --cluster-name hpc-1click-hpc365 --cluster-configuration config.us-east-1.yaml --rollback-on-failure false --wait
{
  "message": "The security token included in the request is expired"
}

Seems like if the command goes beyond 10-15 minutes, the EC2/Cloud9 credentials are cycling and the pcluster CLI doesn't take this into account. The --wait option seems to be deprecated in pcluster so we probably need to move to a polling approach that allows the credentials to refresh.

This causes the outer CloudFormation stack to fail initially, but it will succeed if it is Retried.