aws / aws-parallelcluster-ui

Apache License 2.0
31 stars 18 forks source link

Can't attach a new IAM policy to the Head node #246

Open cartalla opened 1 year ago

cartalla commented 1 year ago

Description

I tried to add a new IAM policy to the Head node of an existing cluster. When I do I get the following error in the CFN stack for the cluster and the update fails:

API: iam:AttachRolePolicy User: arn:aws:sts::415233562408:assumed-role/parallelcluster-ui-3-6-1-ParallelClusterLambdaRol-LI4PKRASE0G9/parallelcluster-ui-3-6-1-P-ParallelClusterFunction-WHsr6AQh5Vmr is not authorized to perform: iam:AttachRolePolicy on resource: role edapc5-RoleHeadNode-3WCWVCK2CZG because no identity-based policy allows the iam:AttachRolePolicy action

Steps to reproduce the issue

  1. Create a cluster using the UI
  2. Stop the cluster
  3. Update the cluster. Add a new IAM policy to the head node.

Expected behaviour

Update succeeds and new managed policy added to the head node role.

Actual behaviour

Update fails

Required info

In order to help us determine the root cause of the issue, please provide the following information:

Additional info

The following information is not required but helpful:

If having problems with cluster creation or update

YAML file generated by the ParallelCluster UI

Imds:
  ImdsSupport: v2.0
HeadNode:
  InstanceType: c6a.large
  Imds:
    Secured: true
  Ssh:
    KeyName: cartalla-us-east-1
  LocalStorage:
    RootVolume:
      VolumeType: gp3
  Networking:
    SubnetId: subnet-01736d0861ece4a42
    AdditionalSecurityGroups:
      - sg-0f7436a767536f5ab
  Iam:
    AdditionalIamPolicies:
      - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
      - Policy: arn:aws:iam::415233562408:policy/ParallelClusterAssetReadPolicy
  Dcv:
    Enabled: true
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: queue-1
      AllocationStrategy: lowest-price
      ComputeResources:
        - Name: queue-1-cr-1
          Instances:
            - InstanceType: c6a.large
          MinCount: 0
          MaxCount: 4
          DisableSimultaneousMultithreading: true
      ComputeSettings:
        LocalStorage:
          RootVolume:
            VolumeType: gp3
      Networking:
        SubnetIds:
          - subnet-01736d0861ece4a42
        PlacementGroup: {}
  SlurmSettings:
    Database:
      PasswordSecretArn: >-
        arn:aws:secretsmanager:us-east-1:415233562408:secret:ClusterPasswordSecret743CC6-jOMTmBFmV2HH-IYTAQK
      Uri: >-
        slurmedapc-slurmdbcluster120ff02f-yuyux7xgwbx7.cluster-c61o7abigj40.us-east-1.rds.amazonaws.com:3306
      UserName: slurm
    EnableMemoryBasedScheduling: true
    CustomSlurmSettings:
      - FederationParameters: fed_display
      - JobRequeue: 1
      - PreemptExemptTime: '0'
      - PreemptMode: REQUEUE
      - PreemptParameters: reclaim_licenses,send_user_signal,strict_order,youngest_first
      - PreemptType: preempt/partition_prio
      - PrologFlags: X11
      - SchedulerParameters: >-
          batch_sched_delay=10,bf_continue,bf_interval=30,bf_licenses,bf_max_job_test=500,bf_max_job_user=0,bf_yield_interval=1000000,default_queue_depth=10000,max_rpc_cnt=100,nohold_on_prolog_fail,sched_min_internal=2000000
      - ScronParameters: enable
      - AccountingStoreFlags: job_comment
      - PriorityType: priority/multifactor
      - PriorityWeightPartition: '100000'
      - PriorityWeightFairshare: '10000'
      - PriorityWeightQOS: '10000'
      - PriorityWeightAge: '1000'
      - PriorityWeightAssoc: '0'
      - PriorityWeightJobSize: '0'
Region: us-east-1
Image:
  Os: alinux2
Tags:
  - Key: parallelcluster-ui
    Value: 'true'

If having problems with custom image creation

YAML file of the custom image

cartalla commented 1 year ago

The problem is in the Lambda role:

parallelcluster-ui-3-6-1-ParallelClusterApi-9ALFVY8XOAU9-PclusterPolicies-YXC-DefaultParallelClusterIamAdminPolicy-1M144USWNSB2D

I updated the following statement and added a line that doesn't require the policy to start with parallelcluster and I was able to add my policy.

        {
            "Action": [
                "iam:PutRolePolicy",
                "iam:DeleteRolePolicy"
            ],
            "Resource": "arn:aws:iam::415233562408:role/parallelcluster/*",
            "Effect": "Allow",
            "Sid": "IamInlinePolicy"
        },
        {
            "Condition": {
                "ArnLike": {
                    "iam:PolicyARN": [
                        "arn:aws:iam::415233562408:policy/*",
                        "arn:aws:iam::415233562408:policy/parallelcluster*",
                        "arn:aws:iam::415233562408:policy/parallelcluster/*",
                        "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy",
                        "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore",
                        "arn:aws:iam::aws:policy/AWSBatchFullAccess",
                        "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess",
                        "arn:aws:iam::aws:policy/service-role/AWSBatchServiceRole",
                        "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role",
                        "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy",
                        "arn:aws:iam::aws:policy/service-role/AmazonEC2SpotFleetTaggingRole",
                        "arn:aws:iam::aws:policy/EC2InstanceProfileForImageBuilder",
                        "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
                    ]
                }
            },
mtfranchetto commented 12 months ago

Hi @cartalla, this is a known issue with PCluster. Look at https://github.com/aws-samples/pcluster-manager/issues/384#issuecomment-1338008307 for more info. There's also another workaround available in the comment. I'm closing the issue for the moment. Feel free to open a new one if you need help.

cartalla commented 11 months ago

Why is this closed? This point to an issue in the old pcluster-manager repo which is superceded by this one. Also, since this is such a simple bug fix, why hasn't it been fixed in the several releases since it was filed? I just hit this again during testing with a new version of pcluster and all I'm doing is following the instructions to use the slurm db.

regoawt commented 8 months ago

I have also just hit this issue, so not sure why it is closed! The workaround is very much a workaround, not a fix.

More fundamentally I'm wondering why there is a list of allowed policies that one can attach/detach via the UI in the first place, especially given there is no such restriction when creating/updating a cluster via the CLI. Is it because of the security implications of users assuming the UI IAM role when they are using the UI? It would be good to know what the rationale for this is.

UPDATE: Found this https://docs.aws.amazon.com/parallelcluster/latest/ug/iam-roles-in-parallelcluster-v3.html#iam-roles-in-parallelcluster-v3-privileged-iam-access

gmarciani commented 8 months ago

Hi @regoawt , thanks for raising up our attention on this. The rationale behind that limitation was security: disabling by default the privileged IAM access mode. Such rationale is still valid, but I agree with you all that we should provide a smoother customer experience to enable it. We will let you know here our plans for it.

regoawt commented 8 months ago

Thanks for the reply @gmarciani, I can see why this is the default behaviour, makes sense. But yes, looking forward to having an easier way around it!