Closed jsmedmar closed 2 years ago
Hi jsmedmar,
Cluster creation times out after 30 minutes. To overcome this, you could create the FSx on AWS console and use FileSystemId
under SharedStorge
to mount it to the cluster.
In a future release (pcluster 3.2), nodes bootstrap timeout will be configurable.
Thank you, Hanwen
Hanwen, thanks so much for the quick response.
For now what I'm doing is disabling the rollback. Eventhough the Cloud Formation stack fails, the cluster works ok.
Looking forward to 3.2
Hello @jsmedmar with ParallelCluster 3.2.0, we introduced the possibility to customize node bootstrap timeout as an experimental feature.
If you want to customize timeouts you can do that using the DevSettings
section in you cluster configuration file, as follows:
DevSettings:
Timeouts:
HeadNodeBootstrapTimeout: 1234 # timeout in seconds
ComputeNodeBootstrapTimeout: 1234 # timeout in seconds
Perfect, will give it a try!
This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.
HeadNodeWaitCondition
times out because FSx file system takes too long to be created whenImportPath
points to a s3 bucket with a very large number of files (>4 million objects; FSx create time >1h). TheHeadNodeWaitCondition
does not time out whenImportPath
points to a small s3 bucket (FSx create time <20 mins).This works:
This doesn't (import path from very large s3 bucket):
Output of
pcluster describe-cluster
command.Bug description and how to reproduce:
Create a cluster with a FSx file system having
ImportPath
point to a very large s3 bucket.If you are reporting issues about cluster creation failure or node failure:
fsx-24T-new-ami-logs-202207221356.tar.gz