aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
https://github.com/aws/aws-parallelcluster
Apache License 2.0
830 stars 312 forks source link

(3.0.0 - 3.2.1) ParallelCluster API cannot create new cluster #4588

Closed francesco-giordano closed 1 year ago

francesco-giordano commented 1 year ago

Bug description

In versions 3.0.0 to 3.2.1 of ParallelCluster, the API is by default deployed with a role that includes the IAM permissions required to perform the operations of the API. Due to a recent change in the CloudWatch service, the permissions required to create a cluster were expanded (to include logs:TagResource and logs:UntagResource), As a result, any existing API deployed using the default role will lack sufficient permissions to create new clusters. When creating a cluster, the CloudFormation stack will show resources with the following error:

User with accountId: XXX is not authorized to perform CreateLogGroup with Tags (Service: CloudWatchLogs, Status Code: 400, Request ID: 4c848ae1-5ff5-43dc-b67b-fd1a0f8cc33e)

If you are deploying the API with a custom role, or using the ParallelCluster CLI with specific IAM permissions, the logs:TagResource and logs:UntagResource actions need to be added to your policy.

Mitigation

To work around the issue on an existing deployed API, you will need to expand the permissions of the role used by the deployed Lambda.

  1. Navigate to the IAM Management Console > Policies
  2. Search for ParallelClusterClusterPolicy
  3. Search for the actions with Sid matching CloudWatchLogs
  4. Edit the matching policy to add the following two permissions to the logs:TagResource logs:UntagResource
  5. Search for ParallelClusterBuildImageManagedPolicy
  6. Search for the actions with Sid matching CloudWatch
  7. Repeat Step 4 above for the policy.

This will expand the policies used by the Lambda to include the new required policy to perform ParallelCluster operations.

You can find a detailed explanation and the mitigation of the problem wiki

francesco-giordano commented 1 year ago

A fix has been deploy for version 3.3.0 at 2022/11/23. Changing the title and description

lukeseawalker commented 1 year ago

Version 3.3.0 released with the fix, see changelog