aws-samples / 1click-hpc

Deploy your HPC Cluster on AWS in 20min. with just 1-Click.
MIT No Attribution
62 stars 44 forks source link

Some heads-up needed to customize the code #18

Closed rvencu closed 2 years ago

rvencu commented 2 years ago

Hi, I am adding customization to implement https://docs.aws.amazon.com/parallelcluster/latest/ug/launch-instances-odcr-v3.html

I am creating a uniquely named policy and attach it to the HeadNode just fine

I am also creating a resource group to add all existing targeted capacity reservations. I should use some query for that or can I just attach arn containing wildcard on last section?

Second and harder problem, I should create the json to override the slurm compute nodes settings. I can retrieve current zone id and account id from the headnode itself but I should somehow transmit the cluster name or the group name so I do not have to hardcode it in the file. Currently that script looks like this: https://github.com/rvencu/1click-hpc/blob/main/modules/50.install.capacity.reservation.pool.sh

#!/bin/bash
set -e

ACCOUNT_ID=`aws sts get-caller-identity | jq -r '."Account"'`
EC2_AVAIL_ZONE=`curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone`
EC2_REGION="`echo \"$EC2_AVAIL_ZONE\" | sed 's/[a-z]$//'`"

# Override run_instance attributes
# Name of the group is still hardcoded, need a way to get variable from cloudformation here
cat > /opt/slurm/etc/pcluster/run_instances_overrides.json << EOF
{
    "compute-od-gpu": {
        "p4d-24xlarge": {
            "CapacityReservationSpecification": {
                "CapacityReservationTarget": {
                    "CapacityReservationResourceGroupArn": "arn:aws:resource-groups:$EC2_REGION:$ACCOUNT_ID:group/EC2CRGroup"
                }
            }
        }
    }
}
EOF
nicolaven commented 2 years ago

Hi, I am not sure I fully understood the first question, but I will try to answer.

I think you can just attach the arn unless you plan to frequently change the ODCR.

as per documentation, I'd just run this: aws resource-groups create-group --name EC2CRGroup \ --configuration '{"Type":"AWS::EC2::CapacityReservationPool"}' '{"Type":"AWS::ResourceGroups::Generic", "Parameters": [{"Name": "allowed-resource-types", "Values": ["AWS::EC2::CapacityReservation"]}]}'

then: aws resource-groups group-resources --region REGION_ID --group EC2CRGroup \ --resource-arns arn:aws:ec2:REGION_ID:ACCOUNT_ID:capacity-reservation/PLACEHOLDER_CAPACITY_RESERVATION

and then create a policy like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RunInstancesInCapacityReservation",
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": [
                "arn:aws:ec2:REGION_ID:ACCOUNT_ID:capacity-reservation/*",
                "arn:aws:resource-groups:REGION_ID:ACCOUNT_ID:group/*"
            ]
        }
    ]
}

here you can attach the arn, not sure it is worth querying and dynamically adding the arn.

the second problem can be solved either by querying (even just aws cli) and getting the cluster name or if it is easier for you, you can just follow the same approach I took here: https://github.com/aws-samples/1click-hpc/blob/main/scripts/Cloud9-Bootstrap.sh#L86

basically you know the cluster name in the Cloud9bootstrap script, this script can replace a token in 50.install.capacity.reservation.pool.sh with the cluster_name and then upload back this file on s3.

hope this helps

Thanks

rvencu commented 2 years ago

Yes manually adding ODCR to the group works already. Then only docs needs updating of necessary manual actions after installation

Second issue indeed I was thinking of using S3 as well but also aws cli approach seems interesting

Thanks a lot

Richard

Obțineți Outlook pentru iOShttps://aka.ms/o0ukef


De la: Nicola Venuti @.> Trimis: Friday, June 17, 2022 10:38:42 AM Către: aws-samples/1click-hpc @.> Cc: Richard Vencu @.>; Author @.> Subiect: Re: [aws-samples/1click-hpc] Some heads-up needed to customize the code (Issue #18)

Hi, I am not sure I fully understood the first question, but I will try to answer.

I think you can just attach the arn unless you plan to frequently change the ODCR.

as per documentation, I'd just run this: aws resource-groups create-group --name EC2CRGroup \ --configuration '{"Type":"AWS::EC2::CapacityReservationPool"}' '{"Type":"AWS::ResourceGroups::Generic", "Parameters": [{"Name": "allowed-resource-types", "Values": ["AWS::EC2::CapacityReservation"]}]}'

then: aws resource-groups group-resources --region REGION_ID --group EC2CRGroup \ --resource-arns arn:aws:ec2:REGION_ID:ACCOUNT_ID:capacity-reservation/PLACEHOLDER_CAPACITY_RESERVATION

and then create a policy like:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "RunInstancesInCapacityReservation", "Effect": "Allow", "Action": "ec2:RunInstances", "Resource": [ "arn:aws:ec2:REGION_ID:ACCOUNT_ID:capacity-reservation/", "arn:aws:resource-groups:REGION_ID:ACCOUNT_ID:group/" ] } ] }

here you can attach the arn, not sure it is worth querying and dynamically adding the arn.

the second problem can be solved either by querying (even just aws cli) and getting the cluster name or if it is easier for you, you can just follow the same approach I took here: https://github.com/aws-samples/1click-hpc/blob/main/scripts/Cloud9-Bootstrap.sh#L86

basically you know the cluster name in the Cloud9bootstrap script, this script can replace a token in 50.install.capacity.reservation.pool.sh with the cluster_name and then upload back this file on s3.

hope this helps

Thanks

— Reply to this email directly, view it on GitHubhttps://github.com/aws-samples/1click-hpc/issues/18#issuecomment-1158586917, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AACGFMYS56PMKUQ7JM4VWLLVPQTQFANCNFSM5Y7UJ7YA. You are receiving this because you authored the thread.Message ID: @.***>