Describe the bug
Behind the scenes of the ParallelCluster API, if you call the list-clusters command, it's basically just running aws cloudformation list-stacks. If you have too many CloudFormation stacks in your account, there isn't any logic to handle a paginated response (or a query argument to only get stacks with the tag of parallelcluster:cluster-name), so you have to pass in the CF nextToken yourself if you want to see all your clusters. As a consequence of this, when updating an existing cluster with this repo, the Lambda function might not see the target cluster, since it's not handling pagination (which to be fair, it shouldn't have to do in the first place)
Additional context
Here's an example of how you might revise the get_cluster_status function in the CreateCluster Lambda (I performed some testing on my end and this seems to work):
def get_clusters(cluster_region):
clusters = []
kwargs_dict = {"region": cluster_region}
while kwargs_dict:
clusters_dict = pc.list_clusters(**kwargs_dict)
if "nextToken" in clusters_dict:
kwargs_dict["next_token"] = clusters_dict["nextToken"]
else:
kwargs_dict = None
for cluster in clusters_dict['clusters']:
clusters.append(cluster)
return clusters
def get_cluster_status(cluster_name, cluster_region):
logger.info("Listing clusters to get cluster status")
cluster_status = None
for cluster_dict in get_clusters(cluster_region):
if cluster_dict['clusterName'] != cluster_name:
continue
logger.info(f"cluster_dict:\n{json.dumps(cluster_dict, indent=4)}")
cluster_status = cluster_dict['clusterStatus']
cluster_cloudformation_status = cluster_dict['cloudformationStackStatus']
logger.info(
f"{cluster_name} exists. Status={cluster_status} and cloudformation status={cluster_cloudformation_status}")
break
return cluster_status
Describe the bug Behind the scenes of the ParallelCluster API, if you call the
list-clusters
command, it's basically just runningaws cloudformation list-stacks
. If you have too many CloudFormation stacks in your account, there isn't any logic to handle a paginated response (or a query argument to only get stacks with the tag ofparallelcluster:cluster-name
), so you have to pass in the CF nextToken yourself if you want to see all your clusters. As a consequence of this, when updating an existing cluster with this repo, the Lambda function might not see the target cluster, since it's not handling pagination (which to be fair, it shouldn't have to do in the first place)Additional context Here's an example of how you might revise the
get_cluster_status
function in the CreateCluster Lambda (I performed some testing on my end and this seems to work):