Closed mission-coliveros closed 5 months ago
ParallelCluster uses the CloudFormation API describe-stacks
via the boto3 package. If you have a lot of CloudFormation stacks in your account, this can cause the response from that API to exceed 1MB. Once the response exceeds 1MB, the API will start paging the results. This is why you see the nextToken
property returned in the list-clusters response. Because this API doesn't support server side filtering, ParallelCluster must filter on the client side - if the cluster stacks are not returned in a certain 1MB page, you may see an empty response for that page.
The solution is to use the nextToken
property value from the respoinse and pass it to a new request: pcluster list-clusters --next-token NEXT_TOKEN
. Repeat this process until nextToken
is null.
I am also told the issue with PCUI should be fixed in a future release.
ParallelCluster uses the CloudFormation API
describe-stacks
via the boto3 package. If you have a lot of CloudFormation stacks in your account, this can cause the response from that API to exceed 1MB. Once the response exceeds 1MB, the API will start paging the results. This is why you see thenextToken
property returned in the list-clusters response. Because this API doesn't support server side filtering, ParallelCluster must filter on the client side - if the cluster stacks are not returned in a certain 1MB page, you may see an empty response for that page.The solution is to use the
nextToken
property value from the respoinse and pass it to a new request:pcluster list-clusters --next-token NEXT_TOKEN
. Repeat this process untilnextToken
is null.
Thanks, we had already determined this issue a few days ago and implemented a workaround.
However, it seems like this logic could be handled from within the ParallelCluster codebase, to handle the pagination of the CloudFormation response
Hi, we have released a new version of PCUI including the fix for displaying all the clusters. You can deploy PCUI with the new version and it should properly display your clusters
Bug description and how to reproduce: We've had a production cluster up and running for about 4 months now, and while it's online, and able to scale up nodes and run jobs, for some reason it has stopped appearing when running list commands via either the ParallelCluster UI, or the CLI. Strangely enough, the
describe-cluster
command still worksAdditional context: Any other context about the problem. E.g.:
list-clusters
command. I think that it will try to create a new cluster under the same name.The commands we're running are:
pcluster list-clusters -r us-east-1
pcluster describe-cluster -r us-east-1 --cluster-name test
pcluster describe-cluster -r us-east-1 --cluster-name <REDACTED_NAME_OF_PROBLEM_CLUSTER>