aws-samples / aws-eda-slurm-cluster

AWS Slurm Cluster for EDA Workloads
MIT No Attribution
23 stars 7 forks source link

[BUG] Getting EC2 instance info fails #200

Closed cartalla closed 5 months ago

cartalla commented 5 months ago

Describe the bug

Multiple pricing lists returned when only 1 expected.

NFO:2024-02-05 17:18:00,237: Getting EC2 instance info for us-east-1 (US East (N. Virginia))
WARNING:2024-02-05 17:20:02,175: No pricelist for mac1.metal us-east-1 (US East (N. Virginia)). Instance type may not be available in this region.
WARNING:2024-02-05 17:20:02,197: No pricelist for mac2-m2.metal us-east-1 (US East (N. Virginia)). Instance type may not be available in this region.
WARNING:2024-02-05 17:20:02,224: No pricelist for mac2-m2pro.metal us-east-1 (US East (N. Virginia)). Instance type may not be available in this region.
WARNING:2024-02-05 17:20:02,253: No pricelist for mac2.metal us-east-1 (US East (N. Virginia)). Instance type may not be available in this region.
Traceback (most recent call last):
  File "app.py", line 32, in <module>
    termination_protection = True,
  File "/workplace/cartalla/github/aws-eda-slurm-cluster/source/.venv/lib64/python3.7/site-packages/jsii/_runtime.py", line 118, in __call__
    inst = super(JSIIMeta, cast(JSIIMeta, cls)).__call__(*args, **kwargs)
  File "/workplace/cartalla/github/aws-eda-slurm-cluster/source/cdk/cdk_slurm_stack.py", line 127, in __init__
    self.check_regions_config()
  File "/workplace/cartalla/github/aws-eda-slurm-cluster/source/cdk/cdk_slurm_stack.py", line 920, in check_regions_config
    self.eC2InstanceTypeInfo = EC2InstanceTypeInfo(self.compute_regions, get_savings_plans=False, json_filename='/tmp/instance_type_info.json', debug=False)
  File "/workplace/cartalla/github/aws-eda-slurm-cluster/source/EC2InstanceTypeInfoPkg/EC2InstanceTypeInfo.py", line 93, in __init__
    self.get_instance_type_and_family_info(region)
  File "/workplace/cartalla/github/aws-eda-slurm-cluster/source/EC2InstanceTypeInfoPkg/EC2InstanceTypeInfo.py", line 195, in get_instance_type_and_family_info
    raise RuntimeError("Number of PriceLists > 1 for {}".format(instanceType))
RuntimeError: Number of PriceLists > 1 for p4d.24xlarge
cartalla commented 5 months ago

Added code to print out the duplicate pricing lists:

INFO:2024-02-05 16:50:33,717: priceList[0]:
{
    "product": {
        "productFamily": "Compute Instance",
        "attributes": {
            "enhancedNetworkingSupported": "No",
            "intelTurboAvailable": "Yes",
            "memory": "1152 GiB",
            "dedicatedEbsThroughput": "19000 Mbps",
            "vcpu": "96",
            "classicnetworkingsupport": "false",
            "capacitystatus": "Used",
            "locationType": "AWS Region",
            "storage": "8 x 1000 SSD",
            "instanceFamily": "GPU instance",
            "operatingSystem": "Linux",
            "intelAvx2Available": "Yes",
            "regionCode": "us-east-1",
            "physicalProcessor": "Intel Xeon Platinum 8275L",
            "clockSpeed": "3 GHz",
            "ecu": "345",
            "networkPerformance": "400 Gigabit",
            "servicename": "Amazon Elastic Compute Cloud",
            "gpuMemory": "NA",
            "vpcnetworkingsupport": "true",
            "instanceType": "p4d.24xlarge",
            "tenancy": "Shared",
            "usagetype": "BoxUsage:p4d.24xlarge",
            "normalizationSizeFactor": "192",
            "gpu": "8",
            "intelAvxAvailable": "Yes",
            "processorFeatures": "Intel AVX; Intel AVX2; Intel AVX512; Intel Turbo",
            "servicecode": "AmazonEC2",
            "licenseModel": "No License required",
            "currentGeneration": "Yes",
            "preInstalledSw": "NA",
            "location": "US East (N. Virginia)",
            "processorArchitecture": "64-bit",
            "marketoption": "OnDemand",
            "operation": "RunInstances",
            "availabilityzone": "NA"
        },
        "sku": "H7NGEAC6UEHNTKSJ
INFO:2024-02-05 16:50:33,718: priceList[1]:
{
    "product": {
        "productFamily": "Compute Instance",
        "attributes": {
            "enhancedNetworkingSupported": "No",
            "intelTurboAvailable": "Yes",
            "memory": "1152 GiB",
            "dedicatedEbsThroughput": "19000 Mbps",
            "vcpu": "96",
            "classicnetworkingsupport": "false",
            "capacitystatus": "Used",
            "locationType": "AWS Region",
            "storage": "8 x 1000 SSD",
            "instanceFamily": "GPU instance",
            "operatingSystem": "Linux",
            "intelAvx2Available": "Yes",
            "regionCode": "us-east-1",
            "physicalProcessor": "Intel Xeon Platinum 8275L",
            "clockSpeed": "3 GHz",
            "ecu": "345",
            "networkPerformance": "400 Gigabit",
            "servicename": "Amazon Elastic Compute Cloud",
            "gpuMemory": "NA",
            "vpcnetworkingsupport": "true",
            "instanceType": "p4d.24xlarge",
            "tenancy": "Shared",
            "usagetype": "BoxUsage:p4d.24xlarge",
            "normalizationSizeFactor": "192",
            "gpu": "8",
            "intelAvxAvailable": "Yes",
            "processorFeatures": "Intel AVX; Intel AVX2; Intel AVX512; Intel Turbo",
            "servicecode": "AmazonEC2",
            "licenseModel": "No License required",
            "currentGeneration": "Yes",
            "preInstalledSw": "NA",
            "location": "US East (N. Virginia)",
            "processorArchitecture": "64-bit",
            "marketoption": "CapacityBlock",
            "operation": "RunInstances:CB",
            "availabilityzone": "NA"
        },
        "sku": "YSXJGN78QTXNVGDQ"
    },

The difference is for the marketoption and operation. The extra Pricinglist has marketoption==CapacityBlock instead of OnDemand and the operation as RunInstance:CB instead of RunInstances.

Add additional filters to the pricing list.