aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
https://github.com/aws/aws-parallelcluster
Apache License 2.0
830 stars 312 forks source link

ParallelCluster 2.9.x: custom tags not propagated to compute instances when scheduler is Slurm #2100

Closed demartinofra closed 3 years ago

demartinofra commented 4 years ago

Affected version: 2.9.0, 2.9.1

Custom tags that can be specified with tags = {"key" : "value", "key2" : "value2"} in the cluster config section are not propagated to compute nodes when Slurm scheduler is used in 2.9.x version.

Apparently while ASG is able to inherit tags defined at the CloudFormation level, plain LaunchTemplates are not.

demartinofra commented 4 years ago

While we are working on a permanent fix to be included in the next version here is a script that can be used to patch a running cluster and have tags applied to compute fleet nodes:

import boto3
import argparse

def get_cloudformation_tags(region, stack_name):
    cfn_client = boto3.client("cloudformation", region_name=region)
    response = cfn_client.describe_stacks(StackName=stack_name)
    return response["Stacks"][0]["Tags"]

def get_launch_templates(region, stack_name):
    cfn_client = boto3.client("cloudformation", region_name=region)
    response = cfn_client.describe_stack_resources(
        StackName=stack_name,
        LogicalResourceId="ComputeFleetHITSubstack",
    )
    substack_name = response["StackResources"][0]["PhysicalResourceId"].split("/")[1]
    response_substack = cfn_client.describe_stack_resources(
        StackName=substack_name,
    )
    return [resource["PhysicalResourceId"] for resource in response_substack["StackResources"] if resource["ResourceType"] == "AWS::EC2::LaunchTemplate"]

def update_launch_template_tags(region, launch_templates, tags):
    ec2_client = boto3.client("ec2", region_name=region)
    for lt in launch_templates:
        response = ec2_client.describe_launch_template_versions(LaunchTemplateId=lt, Versions=["$Latest"])
        lt_data = response["LaunchTemplateVersions"][0]["LaunchTemplateData"]
        lt_data["TagSpecifications"][0]["Tags"].extend(tags)
        lt_data["TagSpecifications"][0]["ResourceType"] = "instance"
        ec2_client.create_launch_template_version(LaunchTemplateId=lt, SourceVersion="$Latest", LaunchTemplateData=lt_data)

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("-r", "--region", help="AWS Region", required=True)
    parser.add_argument("-c", "--cluster-name", help="Cluster name", required=True)
    return parser.parse_args()

args = parse_args()
print("Retrieving CloudFormation tags...")
stack_tags = get_cloudformation_tags(args.region, "parallelcluster-" + args.cluster_name)
print("Found the following tags: {0}".format(stack_tags))
print("Retrieving LaunchTemplates...")
launch_templates = get_launch_templates(args.region, "parallelcluster-" + args.cluster_name)
print("Found the following LaunchTemplates: {0}".format(launch_templates))
print("Adding tags to LaunchTemplates...")
update_launch_template_tags(args.region, launch_templates, stack_tags)
print("Done")

Supposing you created a python script named patch_cluster.py with the content above here is how to run it:

# Stop compute fleet
pcluster stop cluster-name -r region
# Wait for compute fleet to stop. This can be checked with pcluster status
# Patch the cluster
python patch_cluster.py -r region -c cluster-name
# Restart the compute fleet
pcluster start cluster-name -r region

Note: In case you perform a pcluster update you need to reapply the patch

tilne commented 4 years ago

Fixed by #2113. Marking this as pending release.

enrico-usai commented 3 years ago

2.10.0 release is out.